NATIONAL ADVISORY COMMITTEE 
FOR AERONAUTICS 



TECHNICAL NOTE 3053 


A NEW METHOD OF ANALYZING EXTREME -VALUE DATA 



LIBRARY KAFB, NM 


I cCH LtBRARY KAFB, NM 


MnOHAL ADVISORy (X)MffiTTEE TOR AEROmUTICf 
TECHHICAL NOTE 3Q53 

A HEW MEm)D OF AHALIZING EJCEREME-VAtUE KATA 
By JiillTJS Lieblein 





SUMMARY 


A new method is presented and proposed for analyzing extreme-value 
data which nay arise in a wide variety of applications . 

Classical applications of statistical methods, which usually con- 
cern average values, are Inadequate when the quantity of Interest is 
the largest (or smallest) in a set of magnitudes. This is the situation 
in a number of fields, for exanple, gust loads of an airplane in flight, 
the hipest tenperatures or lowest pressures in meteorology, floods and 
droughts in l^rdrology, hrealdLng strengths in materials testing, hreeuk- 
down voltage of cspacitors, and human life spans, in all of which appli- 
cations of methods for dealing with exfcranes have already heen nade. 

Discussion of the proposed method is preceded hy the necessary sta- 
tistical theory which also furnishes a hasis for evaluating the new 
method in relation to existing ones. The techniques described provide 
a simple means for estimating the necessary parameters, making predic- 
tions from the fitted curve, estimating the reliability, and evaluating 
the efficiency of the method in relation to other methods. Moreover, 
these quantities are all produced by a single set of cxjnputatiom 
involving just two work sheets. This bacikground material is not essen- 
tial to an application of the method and may be omitted if desired. The 
method itself is summarized for practical cx»nvenienc^, illustrated step 
by step, and compared with present procedures. The advantages of the 
preposed method are also discrussed, cM of among which are:^ (l) For the 
first time there is available an uribiased estimator of known efficiency. 
(2) The preposed estimator appears to be more efficient than a sinplified 
form of the Gumbel estimator in nsiy practic^al CMses, namely, for sanples 
of about 20 or more and a probabilily level P = 0.95 more. The 
inprovement in efficiency Increases with increasing P or inereasing 
sample size. When exmpared with the original (Junibel estimator, the pro- 
posed one is \p to twice as efficient. ( 5 ) The confidence intervals are 
found to a cdjoser approximation and are in many ceases nanrower than the 
ones in the Gunibel method. 

Thus, while the Gumbel tecdmiqjues are very useful in many cases, the 
methods develcped in this report will be of special interest to those who 
must extract the greatest amount of informaticai frcmi a limited set of 
costly data. 

■4lhe technical terms used here are defined and discussed in the 
main text. 



2 


KACA TN 3055 


Included in the rerport are several appendixes presenting mathematical 
developments not given in the text. 


IHTRDmCTION 


The statistical theory of extreme values has heen found to have 
wide applicability in many diverse fields, for example, meteorological 
extremes, floods, drou^ts, breaking strength of textiles anfl other 
types of materials, span of human life, gust loads experienced by an 
airplane in flight, and breakdown voltage of capacitors. 

The two existing methods of analyzing extreme-value data have 
several limitations, discussed in the body of this report. One of these 
methods is known as the method of max-imnm likelihood and bna been 
described by Kimball (refs. 1 and 2). The other, the method of moments, 
has been developed by Gumbel (refs . 3 "feo 5) ^ «^nd its application to 
gust-load problems has been d i scussed in detail in a previous HACA 
Technical Note (ref. 6) . 

The present report gives a new nethod for dealing with the problem 
of analyzing extreme measurements, treated in reference 6, which has 
certain advantages over the existing methods. The method of application 
is presented in detail, together with the necessary work sheets and other 
data, and the new method is compared with the method of moments previously 
in use. For definiteness, the discussion is at times presented in terms 
of application to gust loads, but the method is also applicable to other 
fields where extreme values occur. 

This work was conducted at the National Bureau of Standards imder 
the sponsorship and with the financial assistance of the National Advisory 
Committee for Aeronautics. 

The airfchor has benefited greatly from the generous cooperation of a 
number of persons in connection with the work embodied in the present 
report. He wishes particularly to stress his gratitude to those persons 
in the National Bureau of Standards whose painstaking efforts were indis- 
pensable to the successful completion of the entire project: Dr. Dan 

Teichroew of the Institute for Numerical Analysis for carrying out on IBM 
equipment the en^jirical sampling procedtrres which made possible the compar- 
ison bf methods in the section "Theoretical Conparison" and in appendix B} 
Miss Irene Stegun of the Computation Laboratory, tmder idiose sxpervision the 
basic computations described in appendix C were performed; and to the fol- 
lowing personnel of the Statistical Engineering Laboratory: Mr. I. R. 

Savage for contributing appendix A, concerriing the nonexistence of sxjffi- 
cient statistics, and Mrs. L. S. Demiing for her success in producing the 



WACA TN 3053 


3 


particrularly effective form of the tables and figures from unusually 
difficult material. Thanks are also due to Mr. W. R. Knight, guest 
worker from Antioch College, for the very numerous smaller confutations . 


SYMBOLS 


By "sanfles" are meant ind^endent random sanfles from the extreme- 

VEilue distribution. 

ai, h^ mmerical quantities entering into wei^ts of order- 

statistics estimator for. sample of n and 
i = 1, 2, . . . n (see table l) 

cov (y,s) or a(y,s) convariance of mean and standard deviation in 

samples of n from reduced distribution 

E( ) mathematical e^ectation (or mean value) of a quantity 

(see, e.g., eq. (6)) 

Ejj^, Eq efficiency or order-statistics estimator for suhgrorfs 

of m ohseivations, or for sanfles of n (see 
table IH(b)) 

E(s) mean value of standard deviation in samples of n 

from reduced distribution 

F(x) probability (cnimulative) distribution function of 

extreme-valTie distribution with two parameters, 

F(x) = F(x;u,p) = e2cp 

f(x) density (or frequency) function of extreme-value 

distribution F(x), dF(x)/dx (fig. l) 

k number of equal subgroups of size m contained in 

sample of n 

MSE ( ) mean square error of ( ) ; eqiials variance plus square 

of bias 

m size of one of k equal subgrotps contained in sample 

of n' 

m' size of remainder subgroup in sample of n that* is 

left after k equal subgroups of m are ta3sen; 
that is, n = km + m* 



k 


mCA TU 5053 


n 


san5)le size (U denotes 8aB5>le size in Gunibel method) 


P 


probability level associated, vith a predicted value 




Cram^-Eao lower hotmd to 
estimator of parameter 


variance of mibiased 
Ip (see Qq) 


V % 
% 

R(Ti,T2) 


r 


s 


s 


X 


T 


Ti 




t* * 


variance of order-statistics siibestimatar for siib- 
groiq)s of m, or of estimator for sarnie of n 


numerator in Crami&’-Eao lower hound; 
(see table l.,Ll(a)) 




relative efficiency of estimators to 

than unity when is more efficient). 


T2 (greater 
16E (Tp) 

]MSE (Tj^) 


• rank of rth observation (counted from smallest) in 

sanples of n when arranged in ascending order from 
smsLllest to largest observation 


standard deviation of san5>le of n 

-\2 


distribution. 


I 


; Z (yi - yf 


i=l 


from reduced 


standard deviation of san^le of n from original 



average of sxibestimators for k equal-size siibgrongps. 


k 



subestimator for remainder stibgroT:^ (see m’) 

order-statistics subestimator for ith of k equal-size 
stibgrotgos in sanples of n with i = 1 , 2 , . . k 

wei^ts foh T emd T' in grand estinator for sangDie; 
tp = tT + t'T* 



MCA 03J 3053 


5 


u 

y\ 

U 


u‘ 


mode or location parameter of extreme-value distribution 

Gimibel's original estimator of mode u for sample 

^ - ^n 

of n, X = s„ 

f^n ^ 

sinplified expression used to 
estimator of mode u, x - 


represent Gumbel’s 

£ 
at 




X 


random variable ("unreduced") 
distribution F(x) 


having extreme-value 


^1^^^ * • * 


the n order statistics in sample of n, that is, 
the ohservations ranked in ascending order 


x^, x^,T„ three selected order statistics in Mosteller method 

for very large samples of n (O < >v < p. < v < 1) 

X saiiple mean in sanple from original ("unreduced") 

distribirtion 


Yf yp 
P 




P 


P' 


7 

A* 


Ax,n 


reduced variate 

scale parameter of extreme-value distribtitlon F(x) 

Gtumbel's original estimator of p for sanple 
of n, s^/(Tn 

simplified eaqpression used to represent Qumbel's 
estimator of p, ^ s„ 

Euler’s constant, 0.5772156614-9 

half -width of 68- and 95~P6rcent confidence intervals 
when modified hy probabilily factor, l.llt-lBpP and 

3-067BpP, respectively 

half -width of 68-percent confidence interval in method 
of order statistics (table IX) 

half -width of. 68-percent confidence interval in Gimihel 
method, l.l4ip 

first moment or mathematical expectation of random 
variable x; E(x) 



6 


MCA W 5053 


^2 







$Cy) 


p 

variance of reduced distribution, 

lOOP-percent point of extreme-value distribution P(x), 

u + pyp 

sin 5 )lified aj^jresslon used to represent Gunibel estimator 
of 5p 

variance of standard deviation in san 5 >les of n from 
reduced distribution .(table VHI) 

population variance of x and y 

plotting position of rtli observation ranked from 

T 

smal l est, — 

cumulative distribution function of reduced extreme- 
value distribution, exp ^-e“^) 


STATISTICAL THEOPy 

Extreme-Value Distribution and Meaning of Parameters 


The method of analysis presented herein is based upon the assung)- 
tlon that the observed mairlnniTnR to be analyzed are independent observa- 
tions from a statistical distribution of the form 


F(x) 




( 1 ) 


This is the cimiulative (or ogive) form of the distribution, which 
expresses the chance that an observed extreme value (gust loeid, for 
exanple) will not exceed x in value. The more familiar concept of 
the frequency or density function f(x) = F’(x) for this distribution 
nay be obtained by differentiation but is rather cumbersome (see appen- 
dix a) nnfl is not needed for present purposes. The general shape of 
the density function f(x) is shown in figure 1. The meeining of the 
various quantities indicated is explained below. A more detailed graph 
for the case where the parameters are u = 0 and P = 1 (the "reduced" 
extreme-value distribution) is plotted in figure 2. 



MCA TN 3053 


7 


Dlstrlbtrbion (l) has heen studied extensively hy Gunibel (refs. 3 
to 5), n.mnng others, and is known as the asyn 5 )totic distribution of 
largest values. It will he referred to briefly as the extreme-value 
distribution. The significance of the term "asynptotic" is as follows: 
If the underlying distribution of all (not merely the largest) gust 
loads (e.g., effective gust velocity and normal acceleration) is con- 
sidered, then the largest values in repeated large san 5 >les from this 
distribution have a distribution of their own which, as the sample size 
becomes larger and larger, approaches closer and closer (in a certain 
sense) to a limiting distribution. This limit in g distribution is, 
according to evidence presented in reference 6, of the form of equa- 
tion (1), with l/p replacing the parameter a used in the reference. 

The parameters of the extreme-value distribution are deleted in 
figure 1. The quantity u is the mode or highest point of the (fre- 
quency) distribution. The quantity 3 is a scale parameter, analogous 
to the standard deviation a in the case of the normal distribution. 

In fact, p equals (about 3 /^) times the standard deviation of 

the extreme-value distribution. 

Althou^ the two parameters u and p con 5 >letely specify the dis- 
tribution, it is desirable to introduce another quantity | = .u + py 
which is a linear combination of the parameters u and p (and there- 
fore, since known values will be assigned to y, itself a parameter)^ 
and nakes' it possible to estimate u and p simultaneously, rather 
than in terms of two separate fproblems. Thios if | can be estimated 
as a + by with a and b known, then the values u = a and p = b 
can be read off at once. 

The> parameter 5 bas another highly Important meaning. In fig- 
ure 1 the area P xmder the distribution to the left of the ordinate 
erected at 5 represents the probability that a value larger than | 
will not occur. If | is very large, then P very nearly equals the 
whole area, unity, which means an observation is almost certain not to 
exceed in other words, a larger value of 5 will occur only very 
rarely. Thus if P = 0.99> then the corresponding valiie of | has a 
chance of only 0.01 of being exceeded. To denote this dependence of | 
tpon the probability P a subscript is used: |p. This parameter is 


^That is, the transformed parameters (|,P’), obtained from the 
original parametei^ ( 11 , p) by the linear transformation | = u + Py, 

P’ = P, are of concern. Attention will henceforth be given only to the 
first parameter |, disregarding the second parameter p' of the trans- 
formed pair (|,P'). Whenever it should become necessary to refer 
to p', however, the prime will be dropped for siapllcity. (See foot- 
note 7«) 



8 


HACA TH 3055 


called a percentage point or the lOOP-percent point of the extreme-value 
distribution. If gp can be estimated for different probability levels 

such as P = 0.90, 0.95, 0.99, and so forth, then these values are pre- 
cisely the predictions desired for, say, gust-load accelerations that 
will be exceeded (on the average) only 10 , 5 ^ so forth, respec- 

tively, times in 100 . 

The explicit relationship between |p and P can be determined 

by mekns of formula (l) for the extreme-value distribution. If x is 
put equal to gp, then P, the probabili-ty of not exceeding this value, 

is singjly F(lp) • Thus 


= exp {-e"^) (2) 

since gp = u + Py. Hence, for a given (usually large) probability P, 
the corresponding |p is obtained by finding y from relation (2) and 

then writing 

Ip = u + pyp (3) 



where the subscript P has been added to y to denote dependence on P. 
Conparison of the right members of equations (l) and ( 2 ) shows that the 
quantity y bears the following sinple relation to the corresponding 
variable x in equation (l) : 


y = 



ih) 


or 


X = u + py 


( 5 ) 


Also, if in equation (l) one sets u = 0 and p = 1 , then x has the 
same distribution as that given by the right-hand side of •eq.mtlon (2) . 



MCA TO 5053 


9 


In other words, y as defined, hy equation ( 4 ) or (5) has an extreme- 
value distribution whose parameters have the extremely single values 
u = 0 and p = 1 . Th\is y is called the reduced variate^ and is 
perfectly analogous to the standardized variate t=(x-ii)/a of nor- 
mal distribution theory. The distribution of y in equation (2), 
called the reduced distribution, has heen tabulated in table 2 of ref- 
erence which also contains a table the inverse function as well 
as a number of other' tables related to the application of extreme- 
value theory. 

iFrom the above discussion it is evident that the solutions of both 
the problems of estimation and prediction are embodied in the one quan- 
tity 5 p = u + Pyp- Estimation of this quantity will be one of the main 

objectives of the remainder of this report. 


Beterminatlon of Method of intimation 

•To a"void confusion, a distinction is made between a function of 
sanple variables Xj^, 3^, . . ., x^, such as the sanple mean 

g(xp, X2, • = X = ^x^^ + + . . . + Xjj)|n, and the numerical 

■valTies g^ = g^Xj^°, ^2°^ • • assumed "by the function when the 

» 

actu al values of the obseivations are substituted into the 

function. If the function is xised to estimate a parameter, it will be 
called an estimator of the parameter; the particular numerical value 
assumed in a given case wi ll be called an estimate. 

In searching for estimators -the first step is to seek what are 
known as sufficient statistics . A definition of this concept may be 
foimd in any advanced text on statistical theory, for example, ref- 
erence 8 (vol. H, p. 81); but the feature of importance here is that, 
given a set of joint s'ufficient statistics, that is, certain functions 
of the sample obseivations, it is often possible to deduce from them 
an estinator with certain desirable properties, provided that the n\mi- 
ber of such functions does not depend xpon sanple size. If it turns 
out that the only set of sufficient s’tatistics is the trivial set con- 
sisting of the n functions tj^(xp, . . ., Xjj) = x^, i = 1 , . . ., n, 

that is, the n sanple observations themselves, then obviously this 
furnishes no guide whatever for constructing functions of the x's 
which are optimum estinators. 


3 

■^The variate x 
"unreduced" variate. 


is sometimes referred to as the original or 



10 


NACA TFf 5053 


Investigation reveals that, unfoiijunately, joint -sufficient sta- 
tistics do not exist for the two parameters of the extreme-value dis- 
tribution. A proof of this fact (which was conjectured by Kimball, 
ref. 1 , p. 299) lias been discovered hy Mr. I. Kichard Savage of the 
Statistical Engineering Laboratory of the National Bureau of Standards 
and is presented in appendix A. 

It may be noted that Kimball (ref. l) has studied a broader concept 
called "set of statistical estimation functions" whereby the estimators 
of the parameters are given, not by explicit formulas involving only the 
sanple values, but Inplicitly as the solutions of a set of simultaneous 
equations, for example, the classical TnaxiTnum-1 iTcal ihn nd equations. 
Unfortunately, such estiniators do not seem to lend themselves to the 
procedure referred to above for construct in g optimum estimators, and 
there seems to be no analytical means of accurately evaluating the 
important characteristics of bias and efficiency, defined helow, for 
such estimators in the case of finite san5)les. (Althovi^ these esti- 
mators may be asynptoticrally optimum, i.e., for infinitely large sanqoles, 
this need not be the c:ase for samples of finite size.) 

A second method of approach to. the prohlem of estimation is the 
classical one known as the method of moments. In the ciase of the 
extreme-value population this method is as follows: 

The first two moments of the extreme-value population (l) are 


P3, = E(x) = u + pE(y) (6) 


(t/ = e[^ - E(x]j ^ = ^2E[^ - E(y^ ^ (j) 

where y has the reduced extreme-value distribution (2), E denotes 
mathematical expectation, and is the variance, the second- moment 

about the mean. Using the moments of the reduced distribution (see, 
e.g., ref. 8 , vol. I, p. 22 l), 

E(y) = = 7 = 0.577216 (Euler’s constant) (8) 

= P 2 = — = 1-64Jj-93^ 

^ 6 


( 9 ) 



MCA TN 3053 


11 


there are obtained 


Hx = u + 73 



(10) 


relations which express the population moments in terms of the popula- 
tion parameters. Therefore, if one had good estimators of the popiilation 
moments, the parameters could readily he found. This fact constitutes 
the essence of the method of moments. It consists in treating the sanple 
as an adequate representation of the population, replacing the popTila- 
tion moments in the eacpressions which relate them to the parameters hy 
the corresponding s anp le mom e nts, for example, hy the sample 

mean x, and or^ hy the sanple standard deviation 



This gives x = u + 7P and 
mators of the parameters: 


s^ = 



3 , which yield the moment esti- 


For 3, 3 = ^^Jt)s2. 
For u, u = X - 


(12) 


These are essentially the estimators which form the basis of Gumbel's 
method (ref. 5, lect. 3, eq.. (3-29), with , u^ = u and 1 /a^ = 


^^e actual estimators used in the Gumbel method are r 1 ightly more 
conplicated (ref. 5, lect. 3 , eq. ( 3 - 39 )), tut the difference is not 
inportant at this point. (See appendix B.) 



12 


KACA TN 3053 


TMs method Is Jiistifled hy the fact that tuoder general conditions the 
estimator functions u = Xq, . - and p = 

in equations (l 2 ) approach (in a certain sense) the values of the corre- 
sponding parameters u anfl p as the sample size hecomes Infinite. 

This method has apparently given satisfactory results in practice. 

It is, however, subject to an inportant limitation. In studying esti- 
mators it is highly desirable to knew something about their probability 
distributions - if not the exact density functions, then at least their 
means and variances . The mean value (mathematical expectation) of an 
estimator indicates whether on the average the estimates given by it are 
too high or too low relative to the actual value of the parameter esti- 
mated - in other words, trtiether there is ary bias in us in g the estimator. 
Similarly, the variance indicates how much the estimates scatter among 
themselves and is the basis for constructing a measure of efficiency 
which makes it possible to ccmpeire the performances of different esti- 
mators. A more useful exjne^t for some purposes than variance is mean 
square error whicih measures how far the estimates deviate, on the average, 
not from their own Tnaan but from the quantity - the parameter - which 
they are supposed to measure. There is a simple relationship between 
variance and mean square error, namely. 


Mean square error = Var i ance + (Bias)^ 


(13) 


Thus, for unbiased estimators, variance and mean square error are iden- 
tical, and for brevity the term "variance” will be used in sue*, cases. 

But it should be remembered that the concept in view is actually the 
mean square error. This becomes Especially inportmt later when biased 
estimators are discnissed (appendix: B) and variance and mean square error 
are no longer Identical. 

If one tries to determine the mean (or expected) values of the 
estimators u and p in equations (l 2 ), it is found that statistically 
these functions are quite ccsmpliciated, leading to veiy difficult multiple 
integrals which apparently c:an be evaluated accurately only by large-scale 
numerical integration.^ This difficulty evidently persists if one is 
interested in the parameter |p = u + Pyp instead of in u or p 

separately. 


^Shorter methods of limited accruracy are possible and have been 
used in this report for conparison ptirposes. (See the section 
"Theoretical Conparison" and appendix B.) 


mCA TKf 3053 


15 


Order-Statistics Approach, for Small Sangjles 

Apparently the only method of estimation which avoids the dif- 
ficulty of con^plicated calculations is the method of order statistics. 

If the values in a sample of n observations are arranged in, 
say, increasing order of size and daioted hy x^, . . . , 

^1 ^ ^ ^ ^ then these values x^^ are called order statistics. 

The smallest is called the first order statistic; the middle one (if n 
is odd) , the median; the one which is one-fourth the way up from the 
bottom, the first quartile; and so forth. (If there are several eqTxal 
ones, then suitable modifications are made in the definitions) . There 
is an extensive literature on this subject, chief among which is the 
conprehensive stcrv^ in reference 9- 

Order statistics provide rapid and practical methods of analyzing 
data. The range x^ - Xj^ is a very common illustration from quality 

control. It is sinply the difference of two order statistics, the 
largest and smallest, and its properties have been extensively studied 
for sanples from the normal distribution. The range been found to 
yield estimates of the standard deviation of the population that often 
conpare very favorably with the theoretically best obtainable. More 
general linear functions, Cixi + C^X2 + • • • + ^liich give weight 

to every sanple value, have also been studied (ref. lO), and values of 
the coefficients have been found which mate it possible to estimate very 
sinply asad remarkably well certain quantities which previously were 
obtained only by more conplicated calculations . 

This procedure will be carried over and extended to the case of 
sanples from the extreme-value distribution (l) . The method will in 
many respects follow the general approach used in reference 10 for 
several other distributions. The aim is to determine the weights Wj^, 

i = 1, 2, . . ., n, for all the n order statistics in a sample of 
size n so that the linear est imat or 


n 

i=l 

has the properties desired, namely: 

( 1 ) The mathematlc£il expectation equals the parameter to be esti- 
mated; that is, the estimator is unbiased: 


E(L) = Ip 


(15) 



14 


MCA ™ 3053 


(2) The mean square error ( 1 ©E) , which in this case is the same as 
the variance, is as small as possible, consistent with condition (l): 


MSE (L) = a^L) = E [l - E(L)] ^ 

= A TniriiTTTiTm 


(16) 


An estimator L which satisfies these two conditions will he 
denoted hy 'Ip, a notation suggested hy condition (l) . 

Condition ( 2 ) is equivalent to saying that the estimator 'Ip is 

as efficient as possible under the given conditions. This concept will 
be discussed below. 

The mathematical formulation of this minimum-variance problem is 
developed in appendix C, and the solutions (the weights) are shown in 
table I for n = 2 to n = 6 - The case for n greater than 6 is dis- 
ciissed in the next section. For each given value of n, n wei^ts 
Wi, w^, . . are determined that depend on the quantity yp that 

occurs in the parameter |p = u + ^yp to be estimated. The weights 

are each of the following form: 


WjL = + b^yp. 


i = 1, 2, . . ., n (17) 


Substituting these weights for given n into equation (16) actually 
gives the mini mum value that the variance can attain under the 

above conditions, and this value depends upon yp quadratically; 

''^mln = ^n^P + *^n)P^ 

Table I gives the values a^, b^, A^, B^, and C^ which have been 

fo\md by exact coD5)utation methods as indicated in appendix C and 
table II. The quantities = On shown in table IH(a) . 

As the sample size increases, the estimation is expected to improve 
flrui the variance, to diminish. In order to have a convenient standard 



MCA 0!N 3053 


15 


of con^iarison, in the case of unbiased estimators,^ all variances are 
scaled hy dividing into a theoretically specified variance known 

as the "Cramer-Rao lower hound" (ref. 11, p. 480, eq. ( 32 . 3 . 3 a) which 
is less than or at most equal to the variance of any (unbiased) estimator 

of the parameter in question.® The result is then an absolute number 
between 0 and 1 which, when expressed as a percentage, is called the 
efficiency of the estimator for san^jles of n; 


Efficiency (l) = E^(l) = 


(19) 


The quantities E^, which evidently depend upon yp, and therefore 

tqion P, are given for n = 2 to 6 for selected values of the proba- 
bility P in table Hl(b) . Table Hl(a) contains the numerical values 
of the variances and the lower bound in terms of the param- 

p 

eter p . The expression for Qj-p has been implicitly given in refer- 
ence 2, page 113 , and is indicated in the first footnote to table HI (a) 
of this report. 


^or biased estimators, see appendix B. 

"^This Cram^-Eao bound is given for the case where the distribution 
has only one parameter to be estimated. For the extreme-value distribu- 
tion with the two parameters (|,p), P can be regarded- as a "nuisance 
parameter" and a "Cramer-Eao boxmd" thus obtained for g, the expression 
for which will involve p (see first footnote to table III (a)). This pro- 
cedinre is based on the "method of nuisance parameters" discussed in ref- 
erence 12. (See also footnote 2 in text.) 


^Chere may or may not exist estinators whose variances reach the 
lower limit If (as may happen) there exists a Q’ > Qj.p such 

that the variance of every estimtor is ^and, of course, 


then Q' may be substituted for Qj^ in the numerator of the expres- 
sion for efficiency ( 19 ) without the fraction exceeding 1. The inves- 
tigation of the existence of Q' is too coinplex a ma-tter for the pur- 
poses of -this report. However, -the only effect of using a lower 
bound Qrp which is too low is to understate the efficiency, so that 


the results are on the seife, conservative side. 



l6 


HACA aw 3055 


0?wo points should he noted about the choiee of prohahility levels 
shown in table HI. The value P = 0 . 36788 = l/e, which corresponds 
to yp = 0, is iniportant because it gives the mode, one of the desired 

parameters of the distribution. Thi s is evident from the fact that the 
parameter being estimated is gp = u + Pyp = u, the mode, for yp = 0. 

Similarly, the limiting value P = 1 corresponds to the scale param- 
eter p. Obis may be seen as follows: If P approaches 1, the values 

of Ip and yp both become indefinitely large, but their ratio 

Ip' = Sp/yp = (^/^p) P considered to be a new parameter which 

approaches p, since the mode u remains fixed and finite (as does 
also p) . Hence p ntay be estimated by first estimating |p ' for 

arbitrary P and then letting P approach 1. 

How from equations (llj-) and (l?), the linear estimator L = |p is 
of the form ' 


Ip - fl + ypfp 


(20) 


where f^ and fp are functions of the sasiple values which do not 
involve yp. By the preceding remark, the parameter p can then be 
estimated by writing down the corresponding estimator of lp’> 




and letting P approach 1, obtaining 


lp = fp 


as the correspond i ng estimator of p. 


In other words, an estimator 1^ of p may be obtained by sinply 


taking the coefficient of y^ 


in fp when written in the form of 


equation (20) . Similarly, the variance of |p is the coefficient 



3Q 


MCA ™ 3053 


17 


P 

of y-p in the variance of gp. This may readily he seen as follows. 
From equation ( 20 ), 

cov + yp2a2(fg) 

= A + Byp + Cyp^ 

where A, B, and C are quantities which do not Involve yp (though 
they may Involve p, in general); thus, as P approaches 1 and y^ 
increases without limit. 



the coefficient of yp^ in . From this it follows also that the 

efficiency of the estimator being a ratio of variances, is simply 

the ratio of the coefficients of yp^, the other terms being disregarded. 

These facts applied to the estimator g^ make it possible to avoid 

a separate treatment for the two parameters u and p. Their estimators 
are each represented by a single line in a table (such as table Hi) 
showing values for various probability levels: P = 0 . 36788 (or yp = O) 

gives u; P = 1 (or yp = ») gives p. 

The concepts of variance and efficiency have also a more concrete, 
practical significance. The lower bound to the variance Qj^ has the 

form = Qq^ei, where is a quadratic function of yp but is 

independent of sample size n. For two samples of sizes n’ nnri n", 
the variances and are in the ratio 



18 


MCA 0]N 5053 


that is, inversely proportional to san^ile size. Similarly, if there 
were two estimators for the same sanple size, the ratio of their vari- 
ances could he formed and thou^t of as representing a ratio (inverse) 
of (hypothetical) sample sizes. Thus, if, for a sanple of 20 , the 
variance Q' of one estimator were one-half the variance Q" of an 
alternative estimator, then the first estimator would require a sanple 
of only 10 to give as much information as could he obtained with the 
second from a sanple of 20 . This saving of half the number of observa- 
tions is expressed by saying that the first estimator is twice as effi- 
cient as the second. In general, a saving of the fraction p of the 
observations makes one estimator l/(l ~ p) times as efficient as a 
second. 

The efficiencies of the estimators |p in table T TT are more con- 
veniently conpared in graphical form, as in figure 3* heavy hori- 

zontal line at the top indicates perfect or 100-percent efficiency, and 
the rising cinrves as n increases show how closely the estimator is 
approaching- the standard of perfection. The most outstanding fact is 
that, in marked contrast with a theoretical, perfect estimator, the 
efficiency of the actual estimator tp depends ipon the probability P, 

being the largest for the middle ranges 0.40 to O .60 and dropping con- 
siderably at the ends near 0 and 1. Since analysis of extreme (largest) 
data is concerned chiefly with the larger magnitudes associated with 
very small probabilities of occurring or of being exceeded, interest 
will be limited here to the range above P = 0 . 90 . For n = 6 the 
efficiency exceeds the 80-percent level for all values of P in this 
range that are apt to occiir in practice (i.e., P< 0.999)* In view of 
the satisfactory values of efficiency, further calculation for n > 6 
did not appear warranted at this time, particularly since it became 
apparent that the labor of conputatlon would increase out of all pro- 
portion to the rapidly diminishing ioprovement in efficiency. 

Of course, most samples of observations are larger than the trivial 
size of 6, and the question arises how to handle the larger sanples. 

This is treated in the next section. 


Extension to Larger Saiqoles 

The key to handling samples with more than six observations is to 
treat them as sets or subgroups of samples of 6 (or, if necessary, 5 ) • 

If a sanple size is not an exact multiple of 6 or of 5^ then the sanple 
may be treated as consisting either of subgrotps of 6 with an odd grotp 
remaining having less than 6 items, or of stibgroups of 5 with a remaining 
groip of 6. The sinpler case where n is an exact multiple of 5 or 6 
will be dealt with first. 

N. 



KACA ra 3055 


19 


Case I - Sang)le size em exact multiple of 3 or 6 . - Suppose, in 
general, n = Ion, where m is the size of the subgroup, which need not 
be 6 , and k is the number of subgroups in the sample. If the sample 
is so divided into soibgroi^js that the observations in one subgroup may- 
be considered to be statistically independent of those in any other 
subgroup, then it is legitimate to treat the sample as consisting 

of k Independent suhsanples, each of size m. 

« 

One -way of obta ining independent groups is by use of random numbers . 
This, however, will lose -valuable information embodied in the order in 
which the data were ac-fcually observed. If the data are truly random, 
so that, for exanple, there are no seasonal effects, then this implies 
that subgrotps formed in the order in which the data are observed - 
the first m -values observed put into the first gro-up, the next m 
into the second, and so foirth - should be independent. This assump- 
tion, of course, -underlies the entire method of estimation described in 
this report, and it -will be adopted in the procedures. 

From each STjbgroup form the "subestimator" 


T 


1 





i = 1, 2, . . ., k (21) 


where the weights Wj_, Wg, . . those taken from table I for 

sample size m and are the same for each subgro-ip of m values (but, 
of course, are different for different sizes m) . These k subesti- 
mators Tj^ are then coiribined by simple averaging to form the grand 
sample estimator: 


T = 



(22) 


G?he variance of this estimator is sinply 




(23) 



20 


RACA TN 3053 


since the variance is being taken of a mean of k .independent quan- 
tities each of which has the same variance, 9 var 

^ denotes the variance tahiilated in table TTX(a) for m = 2, 3^ 

5, and 6. 

The efficiency of T is, since n = km and the T^'s, and there- 
fore T, are xmhiased. 


^.-R km ^ m ^ 

^ - Qm 

k 


where = Q^^n = Q^^km, and Q^. is independent of n. Thus one has 

the important fact that, if a san^ile is broken into equal-size subgroups, 
the efficiency of the order-statistics estimator d^ends only i 5 )on the 
size m of the subgroup (and, of course, .on P) . 

Since, according to table Ill(b), efficiency increases with sample 
(or subgroi^)) size, it follows that when uhere is a choice, a san^jle 
should he broken into subgroi^is as large as possible for best efficiency, 
that is, into subgroTjps of 6. If this is not possible, but if the sample 
size n is an exact multiple of 5# then siibgraups of 5 Toe used with 

not much loss in efficiency. The last two columns in table IXl(b) show 
that the loss is 2.h percent (0.86I|-7 - O.SIjiA-) at P = 0.95 and rises 
to a mayimnni of 3-8 percent for the limiting value P = 1. 

Case U - Sample size not an exact multiple of 3 or 6 .- In most 
cases, of course, the sample size will have a remainder when divided 
by both 5 and 6. There is then a great variety of choices as to how 
to partition n into subgroups of 6 and 5 and perhaps other sizes. 

Many of these possibilities have been examined, the aim being to 
establish as simple rules as possible without too great a loss in effi- 
ciency. Fo37t\mately, most of the methods of partitioning a sample of 
given size n do not lead to greatly different efficiencies. Thus the 
following rules can be laid down for n ^ 7 (n 6 does not involve 

breaking into subgroiqps ) ; 

(a) n = 7 tip to large values: (l) Use the partition n = 6k + m' 
if m' = 2, 3, Tj-, 5 . If m' = 1, use n = ^ also m" = 1, 

so that n = ^1, 61 , 91 ^ so forth, that is, a multiple of 30 plus 1 , 



^These variances are equal because they depend only tpon m, P, 
and 3, which are constant for all the subgrotps of the same sample. 



MCA TW 3053 


21 


thsD ( 2 ) write n = 30 k + 1 = (30k - 5) + 6 = |(6k - 1) x ^ + 6; that 
is, split the sanqjle into 6k - 1 siibgroups of 5 aJtKi a remainder sub- 
group of 6. 

(h) n extremely large: If the sangile size is of the order of 

several hundred or more, so that the number of subgroups is of the order 
of 50 or 100, then the amount of confutation becomes increasingly labo- 
rious. For such very large sanfles of extremes, vhich are rather rare, 
a short-cut method is available which is explained in appendix D. While 
its efficiency is substantially less than that of the longer method pre- 
sented here, it is nevertheless of practical value inasmuch as the loss 
in. efficiency, which in practical terms means an effective loss in num- 
ber of observations, is not very infortant when a very extensive amount 
of data happens to be available. 

The variance and efficiency of an estimator for most sanple sizes 
(rule (a)) can be discussed really in general terms. Assmne that 
n = km + m’ represents the separation of the sample into two parts, 

6ne consisting of k equal subgroups of size m = ^ or 6 and the other 
consisting of the remainder subgroiq) of size m' < m except for the 
exceptional case where m = 5 aJ^d m' = 6 (case H, rule (a)(2)). 

The average, T, is formed from the first part as described imder case I 
Then a siibestimator T’ is formed from the remaln&.er subgroiq? of 
m' values using the weights w^' for sanples of size m’: 


m* • 

T'=^w^'x^« ( 25 ) 

i=l 


where i 2, . . ., m’, denotes the m’ values in the sub- 

group. Finally a weighted average of T and T’ is formed, and 'this 
is the grand, sample estimator fp: 


Ip = tT + t'T' 


( 26 ) 



22 


MCA raf 3053 


where the multipliers are 


10 


t = hm/n 

t’ = m'/ii = 1 - t 


(27) 


Since all the siibgroups are independent, so are T and T*; whence 


var (Ip) = ^ ^ + (i^’)^' 


since the variance of the mean, T, is ^ 

From the above discussion it is evident that, once the partitioning 
of sanple size n into n = km + m' is determined, the variance and 
efficiency may he obtained except (in the case of the variance) for a 

factor which must he estimated from the data. Table IV lists for 
convenience the efficiencies at two probability levels, P = 0.99 and 
the limiting value P = 1, for most of the sample sizes that may occur 
in practice with gust-load data, provided the sample is split tp 
according to the above rules. The levels P = 0.99 and P = 1 fur- 
nish a convenient basis for comparing the efficiencies of two different 
partitions of the sample size. At this end of the probability scale 
the difference between the two efficiencies decreases monotonlcally 
as P decreases. Thus, if the difference in efficiencies is 3 percent 
at P = 0.99 and It percent at P = 1, then the difference is between 
3 and 4 percent at P = 0.995^ say, and at P = 0.95 under is apt 

to be substantially below 3 percent, a difference negligible for prac- 
tical ptorposes. The partitions shown in table IV are those recommended 

^*^0ther multipliers are possible. In particular, there is an 
optimum set of multipliers which produces an unbiased estimator 

■jri-th subtly smsiller variance, and hence slightly greater efficiency. 
The optimum multipliers are, however, less simple than the proportional 
ones - for example, they are not constants but depend on P - and the 
gain in efficiency is not great. This was shown by a number of trials 
and by the fact that, in any event, the efficiency cannot exceed that 

for the larger siibgroup size Ejjj (or if m’ > m) and does not 

differ much from it if the total sample size n is at all sizable, 
say >20 . 




MCA TN 3053 


23 


■by rule (a) a'bove. In certain cases the efficiencies of alternative 
partitions are shown in the footnotes to table IV for use in case the 
extra few percentage points in efficiency are considered to he worth 
a little loss of simplicity in computation. 

There are some -useful a priori guides for judging the efficiency 
in any given case even heyond the limit n = 40 of table IV. Thus, 
if n = km+m’, itis clear that the efficiency cannot exceed that 
for the subgroup sizes m and m’ hut must lie somewhere between the 
efficiencies corresponding to these two sanple sizes. If m and m* 
are not far- apeirt, then, regardless of the number of siibgroups k, 
the efficiency is determined he-tween narrow limits. Again, if k is 
siibstantlal, say near 10 or more, then the efficiency is practically 
that for the ] arger sanjjle size m. Of course the maximum efficiency 
obtainable by the procedure outlined here is for case I when the sairple 
size is an exact multiple of 6- For P = 0.99 "the efficiency in such 
a case is 83 . 2 , and for P = 1 it is J 6.8. If any given partition 
results in efficiencies with i n, say, 2 or 3 percent of these values, 
then there is nothing significant to he gained by using any other par- 
tition, uni ess it is such as to sin^lify the confutation. 


suwmaeq: op procedtiris 


The method of analysis -will now he summarized for ease of reference. 
The -use of the method has been considerably sinf lifled by the construc- 
tion of specially designed work sheets. A completely filled out pair 
(work sheets 1 and 2) will be foxmd immediately preceding the tables at 
the end of this report. With the aid of such work sheets about 2 hours 
should be sufficient for all the calculations for a moderate-size sanfle, 
such as the sample of 23 observations analyzed below, and it has been 
found that this period is even sufficient to. include the graphical anal- 
ysis also presented. 


The materials needed for application of the method, besides -work 
sheets 1 and 2 eind a sheet of extreme probability paper, are, in the 
order in which needed; 


( 1 ) Table IV, showing efficiencies for -various 
sample into sub^oups 

(2) Table I, giving the wei^ts a^^ and b^^ 

( 3 ) Table HI, furnishing the quantities Q^, 

and 


miethods of splitting 


^ 2 * 



2 k 


mCA TN 3053 


The assim^tions upon which the method Is based are that the data in 

the given sangole (arranged in the order in which observed^) may be 
treated as independent raadom observations all from the same population 


F(x) 



( in cumulative form) , with constant unknown parameters u and 3 to 
be estimated. 

For concreteness, the rules below refer to an actual exan^le, worked 
out in work sheets 1 and 2 and figure k, consisting of the 23 maximum 
positive acceleration increments observed in 23 flints of an airplane 
and Identified as "EACA-Langley-Saraple III,” which are listed in the 
column headed "Observed extremes. An" in work sheet 1 . These data are 
assmed to be given in the order of observation, so that imder the above 
assumptions this arrangement may be considered to be a random one. 

Each rule (except mles (2) and (7), which are subdivided) consists 
of a single paragraph and this is followed by a detailed explanation of 
its use, inserted for convenience of the user. This makes the list 
unavoidably lengthy, but the rules themselves are brief and simple to 
apply. 

Before starting the calculations, it is desirable to plot the data 
on special probability paper according to the directions in rule (7) (a) 
under "Graphical analysis" in order to obtain a crude judgment of how 
well the data fit the assumed distribution. In rearranging the data in 
order of size, however, care should be taken not to lose the record of 
the original order in which the data were taken because randomness will 
then have to be reintroduced. 

As a result of considerable experimentation it is recommended that 
all computations be carried to exactly the nimiber of places shown for 
each item in the two work sheets. 

Determination of estimators ; The rules for determining the esti- 
mators, using work sheet 1 , are as follows: 

(1) Enter the observations in the second column of work sheet 1 
in the order in which given. The first colttmn is for identification 
purposes . 

^If the observations 60:0 not available in their original order, 
it will first be necessary to randomize them by use of a table of 
random numbers. 



MCA ™ 3055 


25 


( 2 ) Determine the partition of the sanqjle size (if .7 or more, but not 
extranely large) and split the saii 5 )le Into subgroijps as large as possible 
subject to the following rules (a), (b), or (c). If n is extremely 
large, say several hundred or more, see appendix D. 


(a) If n is an exact multiple of 5 or 6, write n = k x 5 or 
n = k X 6; if both, use n = k x 6. 

(b) If n is not an exact mtiltiple of 5 or 6, write n = k x 6 + m’ 
(or n = kx5+m*), where 1 < m' <6, unless n = 31 ^ 6 I, and so forth, 
that is, 1 plTos a multiple of 30. 

(c) If n is of the form 30k + 1, write it as n = (30k - 5) + 6 
= ( 6 k - 1)5 + 6; that is, split n up into 6k - 1 subgroups of 5 and 
a remainder siibgroi^) of 6. 

Once k, m, and m’ are determined the blanks in section I of 
work sheet 1 can be filled in. At the same time, in work sheet 2, the 
n\Jmerical values of m and m' should be entered as subscripts in the 
headings ”Q " and "Q " for columns h and respectively. In the worked 

exan^ile, n = 23 = 3x6 + 5 (rule (b)), so the data are split into three 
main subgroups of 6 and a remainder subgroup of 5* 


( 3 ) Find estimators for the paramete3TS |p and u by filling in 

the blanks and following the directions indicated in work sheet 1, 
sections IIA, IIB, and III. 


In sectibn IIA, obtain the weights and b^ from table I for 

n = m, the size of the main subgroups. Mark off the subgroups by any 
convenient means, ^ arrange the observations in increasing order within 
each subgroup, and enter them horizontally opposite the proper subgroup 

m 


n\imber in section HA. 


Obtain the two product sums 


ZI ^1^ 


and 


i=l 


Zl^i^ 

1=1 


as Indicated in the two right-hand columns and sum all columns 


as shown. The two product sums evalimted for the line labeled "Sum" will 
serve as a check. Form the average T by dividing by the number k of 
main subgrot^s. 


^%t was found convenient here to determine the subgroup size m 
before entering the data in the extreme left columns, so that the sub- 
groups could be plainly indicated by means of a space after every 
mth observation. 



26 


MCA TN 3055 


The work in section IIB is analogous, except that the weights 
and h^' are the a^^ and hj^ shown in table I for n = m’, the size 

of the remainder subgroup; also, since there is only one siibgroup, 
averaging is unnecessary. 

Section HI combines the (sub) estinators T and T’ with the 
proportionality coefficients t and t', determined in section I, to 
produce the final over-all sanple estimator 


Ip = tT + t'T* = 0.929^ + 0 .l 677 l*.yp 

upon collecting the coefficients of yp and Hie constant terms. The 
estimates of the parameters u and p are read off at once from the 
coefficients of 6 p eind entered. This constitutes the fitting of an 
extreme-value distrlhution to the given data. 

Predicted values, confidence hand, efficiency, and plotting positions; 
The predicted values, confidence hand, efficiency, and plotting positions 
are determined as follows, using work sheet 2 : 

( 4 ) Conpute the values of |p in column 3 for the values of P 
and yp shown in columns 1 and- 2 . These values constitxrte the set of 
predictions for the respective prohahility levels. 

Additionsil prohahility levels may he inserted between those shown, if 
desired. The value of yp = -logg ^-logg Pj is found most conveniently 

from table 2 of reference 7 * 

(5) The confidence-hand half -widths ( 68 -percent control curves) are 
conputed from the standard deviations as indicated. 

The numerical, values of the variances ^ and ^1 in colimms 4 
and 5 £ore foxmd under these same headings in table lll(a) and entered 

as shown. The values of t^/k and (t')^ are entered above these valiies, 
as indicated, in order to facilitate conputation of the variances of the 
over-all estimator 





MCA TN 3053 


27 


in column 6. Column 7 gives tlae standard deviation of the estimator |p. 
It is most easily cong)uted hy taking the square root of the coefficient 
of in column 6 and multiplying hy the value p found in section HI 

of work sheet 1. Thus cr(^p) ^°r P = O. 5 O is \J 0 . 0605 I times the 

value p = 0 . 1677 ^ (written at the top of column 7 ^or convenience), 
giving the value O.CA -13 shown. 

The standard deviation of the estimator measures the reliahility, 
that is, the extent to which repeated application of the procedure to 
repeated sanples taken under the same conditions would give values 
clustering more or less closely about the unknown parameter value. For 
exanple, for a fixed prohahillty P, about 68 percent of the time (when 
the assumptions are satisfied) the conputed interval fp plus or 

mln\is one standard deviation will contain the true unknown parameter 
5p = u + Pyp, For two standard deviations the percentage rises to 95*^^ 

Two curved lines, one joining the left-hand end points of these Intervals 
and one joining the right-hand end points, are called control curves 
(see rule ( 7 ) for graphical analysis, belo^v) and these two curves define 
a coiafldence band consisting of the area between them. The interval of 
values of the abscissa x = f ^ included between the control curves, 

when P is given a specific value, is called a confidence Interval. The 
standard deviation in column 7 of work sheet 2 is thus the half -width of 
a 68-percent confidence band (or interval) . If, for example, levels of 
95 percent are desired, the values can be readily obtained by adding 
another column consisting of twice the entries in colimm 7 . 

(6) Efficiency la conputed as follows: The values of for 

the indicated values of P are taken from the column headed in 

table HI (a) , divided by the given sanple size n, and entered in the 

column, 8, of work sheet 2. The efficiency is obtained by dividing 

this by the corresponding entry in column 6, canceling the (which 

was one reeison for carrying it along separately) , and finally entering 
the result in column 9- 


^^These percentages are only approximate since they assimie |p to 

be normally distributed. As indicated in appendix E, this assunption is 
sufficiently correct for practical purposes for samples of the order 
of 100 or more. This may, of course, not be the case for much smaller 
sanples. However, normality assimptions of this kind must often be made 
in practice in the absence of large-scale investigations to establish 
more precise distributions. Results obtained in this manner have often 
been found to be satisfactory. 



28 


NACA TN 3053 


( 7 ) Graphical analysis consists of plotting the data- on sultahly 
ruled paper, drawing the estimated straight line, drawing in the con- 
trol curves, and seeing how well the data fall within them. The method 
is essentially due to Gimibel (cf. ref. I 3 ) . 

(a) In the section of work sheet 2 called "Plotting positions," 
arrange all n observations in the sample in a single ascending series 
from smallest to largest and enter them opposite the rank numbers r = 1 

to n. Compute and enter the plotting positions $(x) = — - — . Then, 

n + 1 

on a sheet of extreme prohahility paper^^ such as that used in figures 4 
and 5^ plot the points observation x^ is plotted on 

the uniform scale along the horizontal axis; the fraction is 

n + 1 

plotted along the nonuniform vertical scale $(x) . These points are 
plotted as shown in figure 4. 

(b) After the points are plotted the estimated line x = u + Py, 

that is, X = 0.9295 + 0 .l 67 Ty (see rule ( 3 ), above), is drawn throri^ 
them. This is easily done from columns 2 and 3 (work sheet 2) , since 
column 3 gives the predicted values of x(= corresponding to the 

values of y (= yp) column 2. An even sinpler method is to take two 

or three widely separated values P in column 1 together with the corre- 
sponding values Ip, plot them on the ®(x) and x scales, respectively, 

and draw the line through them. 

(c) The 68-percent control curves are obtained by measuring off hori- 
zontally, at each value of P in column 1, the distance o^^p)^ taken 

from col\mm 7 , to the right and left of the fitted line and then joining 
all the right sind all the left end points of the intervals so formed, 
as in figure 4. The area included between the two control curves is the 
68-percent confidence band. If most or all of the plotted points fall 
within the band, as in figure 4, then it is concluded that the fit is 
satisfactory and furnishes no evidence that any of the basic ass\m 5 )tions 
are violated. 

(d) The fitted strai^t line provides the predictions for any desired 
probability level P.^5 por example, the prediction for P = 0.995^ which 

^^*T!xtreme probability paper is coordinate paper with one scale (x) 
uniformly spaced and the other (y) distorted in such a manner that the 

extreme-value distribution exp ( -e“^) will plot as a straight line . 

^5on the probability paper (fi^. 4 and 5)7 P is denoted by ^>(x) . 




NACA ™ 5053 


29 


means a vnlue of acceleration increment which, has only 1 chance in 200 
of being exceeded^ is obtained (in fig. 4 ) by reading across to the 
solid (fitted) line at P = 0.995 down to find the value x = 1.82g. 

This is sufficiently close to the value I.8176 obtained by calculation, 
using the value y^ = -5 .29581* The 68-percent curves give a con- 
fidence interval for this value ^ of approximately 1.66 to I.98. This 
means that there is a probability of about two-thirds that such an 
inteival includes the true predicted value that is being estimated. 

The efficiency associated with this estimate is between 80.5 percent and 
82.6 percent (column 9 ) j sufficiently narrow limits for practical purposes. 
If a more accurate value for the prediction or measirre of efficiency is 
desired, it can be readily obtained by inserting a "P = 0 . 995 " line in 
the first table on work sheet 2 and performing the con5)utations indicated 
in columns 2 through 9* 


COMPARISOU WITH METHOD IN PRESENT USE 


It is of interest to conpare ttie proposed order-statistics method 
with the method of moments of Gumbel which has been used tp to now in 
extreme gust-load conputatlons (ref. 6) . The conparison is presented in 
two aspects- - theoretical, involving an enpirical atteupt to evaluate the 

bias auH efficiency!^ of the Gunibel estimator, and practical, showing how 
the two methods work out in an actual example. 


Theoretical Comparison 

Only the general results of the theoretical comparison will be indi- 
cated here, the details being furnished in appendix B. The comparison 
consists in writing down the Gumibel estimator, a function of the obseorva- 
tions involving the sample mean, standard deviation, and the probability 
factor yp, and then obtaining the bias and the relative efficiency of the 

proposed order-statistics estimator to the Gumibel estimoator. 

Of the two characteristics bias and efficiency, the main interest 
at this point is in determining the efficiency of the proposed method, 
since that is the important feature whereby possibilities of cost savings, 
through taking fewer observations, can arise. Bias is less Important for 
this piupose, and its consideration is therefore limited to appendix B. 


% 


or a theoretical comparison of confidence bands, see appendix E. 



30 


MCA TN 5053 


As shown in appendix B (see the section "Conparison With Sinplified 

Gnmbel Estimator") relative efficiency involves the first two moments 
of the sanple mean and sanple standard deviation and the covariance of 
the mean and standard deviation. Of these, only the first two moments 
of the sanple mean can he obtained readily hy standard procediires, while 
a prohibitive amount of numerical integration would he required to 
evaluate the remaining three quantities acciirately. 

Resort was therefore had to a method whereby the theoretical extreme- 
value distribution was represented hy a large set of suitably constructed 
random numbers. By means of these nunibers a large number of actual ran- 
dom sanples were drawn and the results tabulated. This was carried out 
mechanically with hi^-speed IBM equipment. By using 12,000 random nvim- 
bers, 1,200 random sanples of 10 were drawn and a single average figure 
'for relative efficiency was computed for each set of 100 sanples. All 
these conputations were made for the single probability level P = 0.95- 
Other values of P are considered below. 

The results are shown in table V and portrayed in figure 6, For 
sanples of 10, the efficiency was greater for the proposed order- 
statistics estimator in 5 cases out of 12 (relative efficiency R (col- 
rmrn 8) greater than l) and greater for the present moment estimator in 
7 cases out of 12. The average of all 12 relative efficiencies was very 
nearly \mity. These results suggest that, for sanples of 10, the two 
methods are equally efficient. 

The entire procedure was repeated for sanples of 20, obta inin g 6 
(instead of the previous 12) values for the 6 sets of 100 sanples each. 

As table V (colximn 9) figure 6 show, the balance now was 5 to 1 in 
favor of the proposed method, with the average being 1.11, representing 
an 11 percent greater average efficiency for the proposed method. 

For sanples of ^ 0 , there were sets of 100 sanples each, and the 
results (column lO) were 3 to 1 in favor of the proposed method. The 
average relative efficiency was 1.15, representing a 13-percent gain 
in average efficiency. 


17 

The present disciission conpares the order-statistics estimator 

with the G\mibel estimator = x + — ^yp - As explained in 

appendix B, this estimator is a sinplified form of Gumbel's original 
estimator and is used when the sanple of extremes is large. Appendix B 
also considers the origi n al Gumbel estinator, which is a more conpli- 
cated expression used for small sanples, and shows that this estimator 
is both more biased and much less efficient than the sinplified 
estimator . 




MCA ra 3055 


31 


To see the efTect of different protatility levels on these results, 
confutations were undertaken for several values of P "beyond 0.95. How- 
ever, in order to avoid needless calculation, in view of the fact that 
only q.ualitatlve conclusions are warranted, the above procedure was modi- 
fied as follows. The sets of 100 sanples were conibined for each sample 
size, and a single over -all average for relative efficiency was obtained 
for the 1,200 sanfles of 10 , for the 600 samples of 20 , and for the i ^00 
sanf les of 30 , the computations being carried out for the selected prob- 
abilities P = 0.95^ 0.99j and the limiting value, unity. The results 
are shown in table VI. In addition, theoretical calculations^® were made 
to obtain the asynftotic relative efficiencies as sanfle size increases 
without limit. These values will be found at the bottom of colunm 9 of 
table VI. 

The above euiditional results Indicate that increasing the prob- 
ability P tends to increase the efficiency of the proposed method rela- 
tive to Gumbel's. 

It should be pointed out that these values obtained from the em^jlri- 
cal sanpling method are indicative, rather than conclusive, on account 
of the random variation inherent in the method, as manifest in the wide 
fluctuation in efficiencies shown in table V for the individual sets of 
100 samples . Nevertheless, the above results do give strong indication 
for the following statements: 

For samples of 10, the proposed order-statistics method is about as 
efficient as the method of Gumbel, while for sanples of 20 or 30 or more, 
the proposed method is more efficient. For P = 0.95 or greater, this 
increase in efficiency is about 12 to I 5 percent for samples of 20 to 30 
and ultimately rises to 25 to 30 percent for indefinitely large samples. 

If, in the comparison presented above, the simplified Gumibel esti- 
mator is r^laced by the original form of the estimator (see the section 
"Comparison With Original Gumibel Estimator" in appendix B) , then the 
comparison becomes much more favoreble to the proposed order-statistics 
method and it can be stated that, for sanpp.es of 10, 20, and 30 and 
P = 0.95 or more, the order-statistics method is \p to twice as effi- 
cient as the Gumibel method \ising the original estimator. Jforeove:p, 
this 100-percent difference in efficiency between the two methods is 
of sufficient magnitude not to be significantly affected by the sampling 
errors inherent in the method of evaluation. 


^®Since these ceilciilations are mainly of theoretical interest, they 
have been omitted in order to ke^ this report from becoming unduly long. 



52 


KACA TN 3053 


Con5)arlson Based on a San^le of Actual Observations 

A con5)arison of the two methods based on a saii5)le of actual obser- 
vations will now be made. The same data already analyzed by the order- 
statistics method will be used, consisting of the 23 mairiTmmi acceleration 
increments listed in work sheet 1. For convenience a standard form of 
work sheet will be used, employed by the Environmental Protection Section 
of the Office of the Quartermaster General, Department of the Army 
(ref. l4) , for applying the method of moments of Gumbel. To avoid con- 
fusion with work sheets 1 and 2 discussed previously, these new work 
sheets are referred to as table VH, part (a) and part (b) . The items 
are filled in on both parts as directed, except that the factor H/(lI - l) 
is ignored in sections I and IV of part (b), since subsequent theoretical 
investigation has shown its rise to be incorrect; eilso, the values x^^ 

and ill section HI and the entire section V are not needed for the 

present purposes. The values of Cjj. and y^^ in section H are taken 

from a table supplied with the work sheets but omitted here. 

Conparison is best shown graphically, as in figure 5* It "Hill be 
seen that in this particular case the fitted lines given by the two 
methods are not greatly different, the predicted values differing by 
amounts varying from 0.03g at the P = 0.95 level (l chance in 20 of 
being exceeded) to nearly O.lOg for P = 0.999 (1 chance in 1,000 of 
being exceeded) . 

The most striking and significant feature about the conparison in 
figure 5 is the narrowness of the confidence band for the order-statistics 
method conpared with that of the Gunibel method. This is attributable 
mainly to that fact that in the case of the order-statistics estimator 
the confidence -band width is based on the standard deviation of the esti- 
mator, conputed by the methods Indicated in this r^ort, whereas in the 
case of the moment (Gumbel) estimator, the standard deviation, whose 
value is not known, is replaced by a standard deviation that can be 
readily calculated but which results in an unnecessarily wide confidence 
band (for details, see appendix E) . 


Advantages and Limitations of Proposed Method 

From the discussion given herein it appears that the proposed order- 
statistics method offers the following advantages over the method of 
moments now in Tise: 

(a) The proposed method provides for the first time an estimator 
known to be unbiased, whose efficiency can be sinply and accurately 
evalmted . 



NACA [M 3053 


33 


("b) The new estimator is more efficient than a sin^jlified form of 
the Gumbel estimator, for samples of about 20 or more and P = 0.95 
and more. Con5>ared with the original form of the Gumbel estimator, the 
new estimator is up to twice as efficient for the same range of values 
of P and for sarr^les of 10 or more. 

(c) The calculations necessary for the proposed method are simple 
and unified, giving simultaneously (l) estimates of both parameters, 

(2) the predicted values coiresponding to assigned probabilities and the 
reliability of these values, and (3) estimates of the efficiency of the 
method. 

(d) The proposed method tises a more exact procedure for obtaining 
the reliability of predicted values, and this procedure yields smaller 
confidence intervals in many cases. (See appendix E.) 

The following two limitations of the proposed method should be kept 
in mind: 


(a) As is true of einy other method of analyzing data, use of the 
proposed method is appropriate only when the assungjtions upon which it 
is based may be considered to be approximately satisfied; namely, all 
the observations constitute an independent random sanple from the same 


population 


F(x) = exp 



(b) The assumption that the data are to be available in the order 
in which observed is of some inportance. For if the data are first 
rearranged, grouped, or processed in any manner, their randomness 
must be considered lost. In order to use the proposed method it will 
then be necessary to restore randomness by use of a table of random 
niimbers to rearremge the data. This is less desirable and the original, 
order shoiild therefore be preserved if possible. 


This necessity of avoiding preliminary processing imposes a dis- 
advantage on the proposed method, as conpared with the Gimibel method of 
moments, when the sanple is -very large (several hundred or more, say). 

In the latter method the data may be grouped, sinplifying the conputa- 
tlons. The method of order statistics, on the oth^r hand, is not appli- 
cable with groiiped data - each observation must be treated on an indi- 
vidual basis - and hence is not suitable for occasional enormous sanples, 
as is the Gumbel method. However, for such masses of data an even 
sinpler method, described in appendix D, is available. 

The increased amoimt of information is provided by the new method at 
some loss in simplicity of calculation as compared with the Gumbel method. 



54 


MCA TR 5053 


CORCLUDIRG REMMffiB 


This report has developed and illnstrated a new method of analyzing 
extreme-value data based on order statistics that is convenient and. offers 
certain iiig)ortant advantages over the method of moments of Gumbel now in 
use, as well as being siibject to certain li mitations. 

In view of these considerations, this new method is recommended for 
practical use in place of the present method of estimation in cases where 
a limited amount of data must be made to yield as precise resialts as 
possible. 

In developing an estimator intended to be useful and efficient a 
number of subsidiary questions were encountered and, treated. The most 
important of these were (l) obtaining minimum-variance unbiased linear 
functions of order statistics for small samples and ( 2 ) finding the most 
feasible way of breaki ng up a large saitple into subgroiq)s small enoiigh 
to take advantage of the results in (l) . In addition, considerable 
attention was given to a number of theoretical points of difference 
between the present and proposed methods. 

Such theoretical study showed that one feature of the present Gumbel 
method, namely, determination of the confidence intervals or control 
curves for large values of the probability level P, does not appear to 
have an accrxrate theoretical basis and that, as a resxilt, certain adjust- 
ments should be made in the formulas. These adjustments would have the 
effect of replacing the parallel control lines by diverging curves in the 
regions of high values of P, resulting in smaller confidence Intervals 
for the more common values of P and larger intervals for the higher 
values of P that occur less often in practice, as might be expected 
intuitively. 

The sol\rbions to the above two main airxiliary problems have been 
incorporated into a set of tables and a pair of unified work sheets 
designed so that the computations show at a glance the essential quan- 
tities of interest - the antual predictions, their reliabiliiy, and the 
efficiency of the. method. The method includes provision for showing 
these results graphically. 

The present study has also devoted some attention to a method 
involving eng)lrical random sanpling and IBM tabulating equipment in 
cases where direct n\mierical evaluation is prohibitive. The use of 
12,000 random numbers and from 400 to 1,200 random samples was foimd 
insufficient to yield accurate quantitative results for one form of the 
Gumbel estimator (the sinp lifted form) on account of sanpling variation. 
However, definite qualitative resxilts in favor of the proposed method 
were indicated in the case of sanples of 20 and 30 and theoretical cal- 
culation showed that this advantage was considerably greater for indefi- 
nitely large sanples. 



WACA IDT 3053 


55 


As a resvilt of 11116 eicperience gained in these studies, it seems 
likely that for accirrate results perhaps 10 times the number of san5)les 
used (or more) should he taken and the confutations performed through 
specialized procedin:*es on hi^-speed electronic confuting equipment. 

Further calculation showed that, in the case of the original fo3rm 
of the Gunibel estimator, much more definite statements were possible 
concerning efficiency. In this confarison the proposed estimator 
turned out to he up to twice as efficient as that of Gunibel, not only 
for the sanfle sizes of 20 and 30 hut down to sanfles of 10 as well. 
Although for very large sanfles this advantage dropped considerably, the 
proposed estimator remained at least 20 to 30 percent more efficient. 


National Bureau of Standards, 

Washington, D. C., January I3, 1955 - 



56 


NACA 03J 5055 


APPENDIX A 

PROOF TEAT SUBFICIEKT STA3?ISTICS DO HOT EXIST FOR THE 
PARAMETERS OF THE EXTREME-VALUE DISTRIBUnOH^^ 

Problem; Consider a san^ile of n from the extreme-value popxilatlon 
whose density function^® is 

f(x) = ae-a(x-u)-e-^(^-^^ 


The parametears p = l/a > 0 and u are unknown and it is desired to 
find sufficient statistics for them. 

Theory: (l) If t = ^t 2 ^, . . tjj.) is sufficient (i.e., is a set 

of jointly sufficient statistics) for 0 = ( 0 ^^, . . 0 j^) then the 

density function of x = x^) may he written in the form 

P(x,0) = f(t,0)g(x) 


( 2 ) If t(x) = t(x’) for sample points x and x', then 

A ^ = h(x,x') 

P(x'.,0) g(x') 

( 5 ) Hence for all those points where t(x) has a constant value 
the ratio A is free of 0, and th\is sufficient statistics can he 
found hy seeing for which point sets A is constant. 


19 

^This appendix has heen prepared hy Mr. I. Richard Savage of the 
Statistical Engineering lahoratory, national Bureau of Standards. 

2 ^or convenience the symbol a is used in place of the param- 
eter l/p of the text. 



MCA TN 3055 


57 


(!(■) Evidently, if 0 = g(0') ^i.e., ®i = Si(®iS • • -3 ®k')^ 

i = 1, . . is a nonsingular transformation of the parameters, 

then also 


^ f(t,e»)g(x) ^ . 

P(x*,0‘) f(t,0*)g(x) 


using the same (set of estimtors) t as for 0. In other words, if 
a set of statistics t is sufficient for a set of parameters 0, the 
same set t is sufficient for any other set 0 ’ obtained from 0 hy 
a nonsingular transformation. 


Results ; The above theory will now be applied to the problem at 
hand and it will be shown that the largest point set on which A is 
constant conta in s n! points, that is, it takes n functions to 
describe t, so that the resttlt in g sufficient statistic is the trivial 

/ y X y y 

set t = ^x-^, . . ., x^j or I — > • • •> — 1- In other words, 

the only sufficient statistics are the n observations themselves, so 
that there is not a basis TJ^jon which to construct optimum estimtors. 

Analysis; For the distribxrtion f(x) 



If A is free of the peirametei^ P 
a = 1/3 and u, and so are logg A 


and 

and 


u, then it is also free of 
logg A 

. Hence 

aa^ 


n 

logg A = na(x - x’) - ^ 


i=l 





38 


MCA TW 5053 


Let u approach It is first foimd that x = x' in order to have 

logg A free of u and a. Next, 


logg A 
Sa^ 


n r— — 

, .k+1 \ , .k -a(Xi-u) , , \k -a(xi'-u) 

= (-1) ^ (Xi - n) e ^ - (x^’ - n) e ^ 


i=l L_ 


= 0 , 


k — 2, 3^ • • • 


and this is true for k = 1 as well, since x = x*. 

Since this is an identity in u set u = 0 . Then 




i=l 




k -aXi ' 
e ^ 


i=l 


These are finite sums; and, therefore, since they are identities 
in a, it is clear, since a may converge to zero, that 


n n 


Thus the largest set of points of constancy of A consists of those 
points which give the same sample moments, and this fact implies the 
desired result. 

Statement (4) ahove implies that the result also holds if the 
parameter u is replaced hy 5p = u + ^yp = u + yp/a. 

Example: To show how this method works for a familiar problem 

consider a sample of n from a normal distribution; here 




A = e 



MCA TO 3055 


39 


-2 logg A = 




and clearly the necessary and sufficient condition for A to he con- 
stant for all values of and 0 is that = y~ 

which is the classical result that the first two moments 


are sufficient statistics. 



40 


NACA TN 5053 


APPEHDIX B 

DETAILS OF THEORETICAL COMPARISON BETWEEN OBDER-STATISTICS 
ESTIMATOR AND MOMENT ESTIMATOR OF GUMBEL 


Since the order-slsatistics estimator Las teen fully discussed in 
tte text, tte remaining protlem in TnaFIng a conparison "between it and 
Gunibel's moment estimator is, essentially, to develop tte character- 
istics of the Gumhel estimator. 


The method of moments of Guiribel in present use provides the fol- 
lowing estimators for the parameters u and p ^ref. 6 , p. 11 , eqs. ( 26 ) 
and ( 27 )} also ref. 13 , p. 10, eq.. ( 29 ), tut read (-yn/o-) for (yn/a)) : 


— 

U = X 




(Bl) 


where x nufl are the mean and standard deviation of the given sample 

of size nj y^ is a certain corap\ited quantity, depending on the sample 

size n, which approaches Euler's constant 7 = 0.5772 • • • from telow 
as n tecomes Infinite; and is another computed quantity, depend in g 

on n, which approaches T(/\f^= 1.28255 • * • from telow as n tecomes 

infinite. 


For sufficiently large semples the quantities y^ and may he 

replaced ty their limiting values.^ This gives the somewhat simpler 
estimators, for computation purposes. 


u 




Jt 


78 , 


>• 



(B2) 


2lThese limit values have teen used, for example, in reference 4, 
page 176 , and in reference 6 , page 10 . 



MCA TW 3053 


• 41 


It is shown helow that the net effect of this simplification. is to 
diminish the hlas and to Tinderstate greatly the relative efficiency 
of the order-statistics estimator to the Gimibel estimator. Since the 
asyn5)totic form (B 2 ) involves simpler notation and is occasionally used 
in practice, it has seemed desirable to present this case in detail 
below (see the next section) and also in the main text. The corre- 
sponding results for the original, form (B 1 ) are indicated in the sec- 
tion "Con5)arlson With Original Gimibel Estimator" and tabulated in 
table VI. 


Conparison With Simplified Gumbel Estimator 

From the estimators (B 2 ) the following estimator of |p cem be 
built tg), which will be denoted by "1^: 

Ig = u* + P’yp = X + (yp - 7)^ s^ (B3) 

This is a function of the n san5)le values x^^, Xp, . . and it 

is desired to find its mean and variance, and thence its bias and 
efficiency. 

The mean is 


E(Ig) = + (Vp 


- 7)^E(s)P 


which can be rearranged to give 




E(s) - 1 


(yp - 7)^ 


(B 4 ) 


where Ip = u + ypP, E(x) = u + 7P, and E(s) is the expected value of 
the sample standard deviation s when the sample is from the reduced 



h2 


MCA TN 3053 


extreme-value distri'bution exp (-e'^). Equation (b 4) 
Gunibel est im ator is 'biased^ (^unless E(s) = for 
which seems highly unlikely^, with hias 


shows that the 
all saniple sizes. 



= E(f(j) - 5 p = ^E(s) - 1 (yp - 7)3 


The variance of the estimator is 

Cj 


(B5) 



+ 2(yp 

^ (yp - + 2(yp 




cov (y,s) 


(b6) 


where a^(s) is the variance of the sample standard devia- 
tion for samples from the reduced distribution exp ^-e"y ), and cov (y,s) 
is the covariance of the mean and standard deviation in such samples. 


22 


'An \mhiased estimator analogous to is 


fo = X + ^yp - 7 )sx/e(s) 


for, as in equation (B 14 -) , 

e(^o) = + 7P + (yp - 7 )e(s)p/e(s) 

= u + pyp 


I 


P 


However, this estimator cotild not he used in an actual problem since E(s) 
is not known. Computation of this quantity was one of the aims of the 
IBM conputing procedures discussed in the text. 




MCA W 5053 


^3 


The efficiency of coiold "be evaluated "by suitable generaliza- 
tion of eq.uation (19) to biased estimators. The variance in the 

denominator would be replaced by the mean square error. ^5 The numerator 
would have to be replaced by a complicated expression which, for unbiased 
estimators, would reduce to Qp-p. Instead of evaluating efficiency for 

the biased estimator therefore, the discussion will be greatly sim- 

plified by limiting it to relative efficiency. The relative efficiency 
of one estimator to another T2 is defined as the ratio of mean 

square errors 


R 



MSB (T2) 
MSB (tJ 


(B?) 


This ratio has been used as an index of congjarison of two estimators 
(e.g., ref. I5) . Thus, the relative efficiency of the order-statistics 

estimator 5p to the Gumbel estimator is, by equation (I5) and the 

fact that the former estimator is unbiased. 


MSB (L) 

Rflp.lo) 

MSB(lp) 



Bias 

Chi 

2 

ct2| 




Bq. (B6) + |iq. (B5^ ^ 

(iA)v 


(B8) 


where k is the number of subgroups of size m into which the sample 

of n is partitioned^^ (eq. (25), ass-uming there is no remainder sub- 
groiap) , and the expressions needed for the numerator are given by the 
equation numbers indicated. 


^5por discussion of mean sqtiare error see equation (15) and accom- 
panying text. 

^^^Thus, n = 10 = 2 X 5 gives k = 2 and m = 5j n = 20 = h x ^ 
gives k = 4 and m=5; n = 50 = 5x6 gives k = 5 and m = 6. 

For n infinite, m is taken as 6. 



MCA OH 5055 


lt4 


The key q^uantihies needed in the calculation of relative efficiencies 
are, from equations (B 5 ) and (b6), E(s), o^(s), and cov (y,s) . For 

general sau5)le size n, their exact values are given hy multiple integrals 
whose evaluation would apparently require a prohibitive amount of labor. 
Instead, the following method of en^sirical san5)ling was used with the aid 
of IBM calculating and tabulating equipment. 

The \miverse of (reduced) extreme values ®(y) = exp (-e~y) was 

approximated by constructing a population of 12,000 suitable random num- 
bers and pvmching each number on an IBM pimch card. These were then 
mechanically separated into 1,200 random sanples of size n = 10 and 
for each sanple the mean y, standard deviation s, and their product ^ 
were obtained. This was equivalent to having a "population" of 1,200 
means, one of 1,200 standard deviations, and one of 1,200 products of the 
mean and standard deviation- It was then assumed that the arithmetic 
mean of each of the three populations would be a close approximation to 
the mathenatical expectations (averages) of the desired quantities, so 
that these approximations could be taken eis estimates of the moments E(s) 
and E(ys) . From these val^^es and the relation 


the variance 



n - 1 
n 6 


<,2(3) = e(s2) - > 2^^ - |i(sg2 


was conputed and also the covariance 


cov (y,s) = E(ys) - E(y)E(s) = E(ys) - 7 E(s) 


The five quantities E(y), cr^(y), E(s), o^(s), and cov (y,s). are 

shown in table VHI, together with the corresponding theoretical values 
that can be readily calculated. 

In actual use this procedure was modified somewhat, since only one 
value of each of the desired quantities would be produced by the 
12,000 cards and 1,200 samples. This single value would be subject to 



NACA TW 5055 


^5 


the fluctuations of random, sanipllng and would he difficult to rely on in 
TTiak-t.ng inferences. This difficiilty was met hy breaking the "population" 
of 1,200 sanples into 12 sets of 100 san 5 )les and obtaining 12 values of 
each of the desired moments instead of only one. ’ These 12 values, 
although each was based on fewer samples, served to furnish an idea of 
how the single value based on 1,200 samples ™s affected by sanpling 
variation. Such analysis has provided a far firmer basis for judgment 
of relative efficiency. 

The above procedure resulted in moments ceilculated for sanples of 
size n = 10. In like manner, 600 random samples of size n = 20 were 
drawn, after starting afresh by putting all 12,000 cards together, but 
this time only 6 instead of 12 sets of 100 sanples were available, 
resulting in 6 values of the desired quantities for conparison. Finally, 
the 12,000 cards were reprocessed to yield 400 samples of size n = ^0, 
giving h values each based on a set of 100 sanples . 

The resulting sets of 12, 6, and It values each were substituted in 
the appropriate formulas (B 5 ), (b 6), and (b 8) in order to obtain the 
relative efficiency of the order-statistics estimator to the (simplified) 
Gumbel estimator. These formnilas, all of which depend tpon yp, were 

evaluated at the probability level P = 0 . 95 * All these results are 
summarized in table V which shows the values of the bias, mean square 
eiTor, and relative efficiency calculated for each set of 100 samples 
of sizes 10 , 20 , and 50 , together with the corresponding average values 
obtained from all 1,200 sanples combined. 

For ease of comparison, the relative efficiencies are also charted 
in figure 6. 

These results constitute the basis of the statement in the text 
that at the probability level P = 0.95^ for sanples of 20 and 50, the 
proposed method has greater efficiency than the Gunbel method using the 
simplified estimator, while for samples of 10 the efficiencies are about 
the same. 


Conparison With Original Gimbel Estimator 

The estimator corresponding to in equation (bj) , built up from 

the estimators (B1), is 


^G,n = ^ + ^yp = ^ + Vx 



1^6 


MCA TN 3055 


where 


Here 




yn - yp 


■’ll 





and 


^n,P 


rt |\|6~ yp - yn 


^n yp - ^ 


(BIO) 


(Bll) 


is the conversion factor for passing from the multiplier d of s^^ in 
equation (B 3 ) to in equation (B9) . It is apparent from the discus- 

sion at the heginning of this appendix that, for infinitely large values 
of n, ^n P “ equation (B9) includes the asyn 5 )totic case. 

For finite values of n, hovrever, < it jw and < 7. Hence, l^n,P^ 

being a product of tiro factors each greater than 1 , may considerablv 
exceed 1, so tliat the multiplier b^^^p in equations (BIO) and (Bll) 

becomes appreciably larger than the multiplier d in equation (B3)« Thus, 
for samples of 10, 20, and 30 , computation shows that, for P = 0.95^ 
for example. 


kpQ = 1.39T«i 


k2o = 1.234d 




(BI 2 ) 


k jQ — 1 • lY3d 



NACA TN 3055 


k'J 


The hias of ^ is, in a manner similar to that in the preceding 
section, in view of eq^uation (BIO) , 


Kso.n) = |:(yp - ’') + 


' (yp - ’’) 


^E(s)hn^p - 1 


Jt 


P 


(BI5) 


Table VI (columns 2 and 3) indicates that the presence of the factor bn^p 
converts the small negative biases into larger positive ones. 

For the variance there is obtained from equation (B9) , analogously 
to the procedinre in the preceding section. 


^Vo,n) - f 


kj^^a^(s) + 2 k^ cov (y,s) + — 

on 


P 


(Bll^) 


The corresponding expression (B6) may be written 


o2(Iq) = ^ 


.^a^(s) + 2d cov (y,s) + — 

6n 


P" 


(BI5) 


Conparison of these two expressions shows, since cov (y,s) was found 
to be positive, that replacement of d by the larger value con- 

siderably increases the variance of the Gumbel estimator. Values of the 
variance for the original and sinplified estimators are listed in col- 
umns If and 5 of table VT. Conparison of these columns indicates that 
the veiriance of the original estimator can become more than half again 
as large as the variance of the simplified estimators, depending on sample 
size and probability P. The effect is most marked for the lower levels 
of P nnii smaller sample sizes and disappears as shown when both these 
factors increase. 

The result of these Increases in bias and variance is to Increase 
greatly the mean sqtiare error (columins 6 and 'J, table Vl) and thus to 
increase the relative efficiency of the order-statistics estimator 



MCA TN 3053 


1^8 


(columns 8 and 9) • As a result, the order-statistics estimator is to 
twice as efficient as the original Gunibel estimator even for samples as 
small as 10 . This tremendous increase in efficiency falls off slowly, 
as shown, when sangjle size increases. For fixed sample size the effi- 
ciency increases for large values of P. These differences in effi- 
ciency are sufficiently large to outweigh con^jletely any fluctuations 
of random san5>ling attributahle to the engsirical sampling method of 
evaluation lised. 

It must he concluded, therefore, that the original Gunibel estimator 
is hoth more biased and much less efficient than its s imp lified form. 

As a resTilt, con^jarison of the order-statistics estimator with the 
simplified Gimihel estimator gives very conservative results and greatly 
understates the actual ingrovement in efficiency of the proposed method 
over the method in present use. 



MCA TN 3053 


k9 


APPEMIX C 

24 ATHEMATICAL FOEMULATIOH AND SOLUTION 
OF MINIMUM-VARIANCE PROBLEM 

Consider an estimator of |p = u + Py-p of the form 


L = 


n 

21 ^i^ 


i=l 


(Cl) 


where 

of n 
values 




are the n order statistics of a samgple 


from the extreme-value distrihutlon (l) , and seek to find' the 
of the WjL which minimize var (L) subject to 


E(L) = Ip 


(C 2 ) 


The estlmatbr L in equation (C 8 ) below with weights so determined 
is called the mi nimum-variancg, unbiased, (linear) order-statistics 

estimator for sample size n »25 
Writing 


X = u + Py 


(C3) 


where y is the reduced varlahle corresponding to x, one also has 

Xj = u + py^ . (Ch) 

where y^ ^ y^ ^ ^ y^ are the n order statistics of a sample 

of size n from the reduced distribution exp , free of param- 

eters. It follows that 

, problem has been treated by general matrix methods by Lloyd 

(ref. 16). He obtained the solution to a set of equations equivalent to 
sets (C7) and (C 9 ) below, but his results were expressed in very general 
notation and are not in convenient form for use here. 



50 


KACA TO 5055 


E(xJ = u + pE(yi) 


( 05 ) 


since u and though unknown, are constants not subject to sampling 
variation when the operation of expectation is performed. The values 
E^y^^ have been tabulated in reference I7 for i = ‘n(l)min(l,n - 25)^ 

n = 1(1)10(5)60(10)100.^^ 

These results give readily 


E(L) = y~Wj^ [u + PE(y^) = Ip = u + Pyp (C 6 ) 


This is required to be an identity for all values of the parameters u 
and p. Equating their coefficients gives the two conditions on the 
weights w^; 



( 07 ) 


where the ^( 7 ^) sire the numerical values tablulated In reference I7. 


Turning to the variance, there is obtained 


var (L) 


n n n 

i=l j=l i=l 

i J 


E(yi) 


26 


The notation in the table cited differs from that used here; 
in this report corresponds to "the table. 




NACA TN 5055 


51 


From equation (C4) and tlie properties of the variances and covariances 
of linear estimators. 





2 




making an obvious sing)lification in notation, whence 


= var (L) = H 


= Minimum subject to conditions (C7) 


CC8) 


This is a constrained minimum problem for variation in the unknown 
and is equivalent to finding the (unconstrained) minimum of^"^ 

where and are the Lagrange multipliers . Since > 0 is con- 

stant, though unkn own, this is the same as minimizing 


' % =E"iV + 


w . - 1 + p 




^"^The teng)orary notation p and p^ 
the symbols for moments . 


should not be confused with 


52 


HACA TN 3053 


where A = Setting the derivatives with respect 

to Wjj., where k = 1, 2, . . n, e^ual to 0 and dividing hy 2 , 


n 

cTj^^Wjj. + ^ ‘^ik^i + “^ + ^^C^k) ^ k = 1, 2, . . . , n (C9) 

i=l 

These latter are n linear equations which, with the two in condi- 
tions (Cj) , form a simultaneotis system of n + 2 equations in the 
n + 2 unknowns w^, M-* 'rad.ues of A 

HTifl (i, are useful as a check, since, if equation (C9) is multiplied 
hy Wjj. and summed, the result. In view of conditions (cy) which the 

w^’s satisfy and equation (C8) , is 


^n,min ^ M-Yp ® 


that is. 


n,min 


= -A - pyr 


The miplmrmi value will he denoted hy 

Before solving the sets (cy) and (C9) it is necessary to determine 
the coefficients in these linear equations. The values of E(yj^) are 

2 

tabulated, as already mentioned. The variances and covariances cTj^ 

and (T^ involve con^licated integrals. The author has been successful 

in esjpressing these integrals in terms of sin 5 >ler ones already tabulated 
(ref. 18 ) . resvlts are, for the variances. 


= E(yi^) - [E(yi3^ 



MCA ™ 3055 


53 


E 


(y±^) = 


n-x 


(1 - l)I(n - i)l ^ 


7 ^ (-l)Cy“"^g2(l + r) 


X = 1, 2, . . n 


where 


gp(l + r) = — - — t + (7 + log i + r)' 
i + rLo 


and 7 = Euler’s constant = 0.57721 ^66k9 . . and, for the covariances. 


"ij ' 


E 




n: 


j-i -1 n-^ 


(i - 1)I(J - i - l)’(n - j)l ^ 


- z 


(-1)"^® X 


+ r,j-i-r+s) 


i ^ j j ii- ^ J 2 ^ • • n 


wbere the function 0 Is defined hy 


2 2 

2 tu^(t,u) = (u - t)g 2 (t + u) + t^[gjL(t^ - 2 L ^1 + 

in which g^ is tlie same function as "before, 

gfCt) = i (7 + logg t) 



NACA HT 3053 




and 


L(1 + x) 



3-oge 

Tf - 1 


dir 


n=l 




■ I + T ■ * x) 

is Spence's integral, wMcti lias been most extensively tabulated (to 
12 places) in reference 19- The function also occurs in an expres- 

sion for the means: 


®(n) 


rj 

(i - 1 ) l(n - i) I 


^ (-l)^Cr^ ^gl(i + r) 


r=0 


The above formulas have been evaluated as far as n = 6 and the 
results are listed in table II. The values in the table are believed 
to be accurate to the number of places shown. Those for the means 
agree (to within a unit in the seventh place) to the seven places to 
which the meana have previously been tabulated. 

Table H thus provides the coefficients in the system of equa- 
tions (CT) and (C9) in the weights w^ and in A and p. The right- 

hand sides of these n + 2 equations are 1, -yp, 0 . . .,0 and the 
solutions w^. A, and p are linear combinations of these with numeri- 
cal coefficients which involve only and E^y^^ , but not yp. 

Hence the solutions are a l l of the form 

w^ = aj^ + bj^yp, i = 1, 2, . . . , 

A = Cp + dpyp 

p = Cp + dpyp 


n 



NACA TN 3053 


55 


Substituting these values of in equation (C 8 ) yields an expression 

of the form 

On ~ ^n^mln ~ ^n^P ^n)^ (CIO) 

The quantities a^ and h^^ for the weights Wj^, and the coefficients 
and of Q^, are given in table I for n = 2 to 6 . The solution 

of the system of equations became increasingly lengthy for increasing 
values of n, with coirespondingly diminishing accuracy, so that the com- 
putations were discontinued beyond n = 6 . The procedures for handling 
san5>les larger than n = 6 are e:jq)lained in the main text of this report. 



56 


MCA IN 3053 


APPENDIX D 

SHDRT-CDT METHOD FOR VEECT LARGE SAMPLES 


If one has a san 5 )le of several hundred or more extreme observations, 
as may sometimes he the case (e.g. ref. 6, where a sample of 485 extremes 
was analyzed) it is possible to select just three" out of all the observa- 
tions and from them ob"tain useful estimators. 

This technique is based on a method used by Mosteller (ref. 20) for 
san 5 )les from the normal distribution. If the n sample values from a 
(continuous) popxilation whose densi"ty is f(x) when arranged in ascending 
order are denoted by the order statistics x^, X 2 , . . ., x^, and n is 

veiy large, the application of Mosteller *s method Involves taking the 
observations whose ranks are 7\n, pn, and vn, where 0<?v<p<v<l 

pQ 

"With A, p, and V suitably detezmined, and choosing a and b so that^° 


is an (asynptoticeilly) -unbiased estimator of the parameter Ip = u + ^yp. 
(The reason for choosing this particular form is discussed below.) 

The mean ajid variance of the estimator f in equation (D1) are com- 
puted from the corresponding moments of order statistics of the form 

"With n very large and A a proper fraction not too near 0 or 1. Under 
these circumstances the theorem zised by Mosteller states that in the limit, 
as n increases indefinitely, 

( 1 ) X, becomes normeilly distributed, "with mean and "variance 


®(^An) = ^'A 

(D2) 


(D3) 



where t^ is defined by Tv = / f (x) dx, and 

-00 

• 


^^When (as "will generally be the case) the ranks An, pn, and "vn 
are not integers, they will be defined to be the nearest integers to 
these quantities . 



MCA TW 3053 


57 


( 2 ) The covariance of any two order statistics and x^,^, 

A < is given hy 




7v(l - t^) 


where t^^ is defined similarly to t^. 


(Blf) 


For f in equation (D1) to he unbiased in the ca6e where f(x) is 
the extreme-value dls trihut ion. 


E(S) = Ip =3 u + Pyp (D5) 

must he an identity in u and 3. It is first noted that, from, pre- 
vious discussion in the text (see the section "Extreme-Value Distribution 
and Meaning of Parameters"), the parameter |p is precisely the abscissa 

of the ordinate which cuts off the area P to the left. Hence one has 
simply 


tx = + pyx 


(d6) 


Equations (di), (D2), and (D5) then give 


+ Pyp 


or 


a(u + 3y J + h(y^ - y^^p = u + ypp 


from which, upon equating coefficients of u 


and p. 



58 


NACA TN 5053 


In principle, ttie fractions A, P-, and v might he determined so as 
to minimize the variance of ^ and thus make its efficiency a maximum, 
hut this would req.uire very extensive confutation which would not he 
warranted on account of the limited importance of efficiency when the 
availahle sample is veiy large. (For exanple, a 50-percent-efficlent 
estimator with a sample of 1,000 gives results eq,ui valent to iising a 
sanple of 5 OO - still a very large sanple.29) Instead consider esti- 
mators of Ip of the form 


I = u + yj3 (d8) 

where u and p are estimators of the two parameters u and p that 
involve the fewest possible number of order statistics without undue 

sacrifice in efficiency as computed for indefinitely large samples. The 
aim is to find, with a Tnl-n-iTniim amomt of confutation, sepairate unbiased 
estimators u and ^ of the parameters u and 3, each of which has 
Tnim'TTniTn variance or best efficiency in some sense, in the hope that the 
linear combination (d 8) , which will also be unbiased, will turn out to 
have efficiency which is not unreasonably smalls This is a heuristic 
method, since the fact that u and B are efficient does not inply 
that their combination u + yp3 is efficient. Better estimators 

probably exist, but obtaining just one of reasonable efficiency is 
satisfactory. 

It turns out that the modal parameter u can be estimated by a 
single order statistic. Gumbel has shown (ref. 21, eq.. ( 50 )) that the 
value of p for which x^ best (i.e., with the least variance or most 

efficiency) estimates u is p = 0. 20319 • For simplicity, therefore, 
replace u in equation (d 8) by 


^ - ^.20n 


(D9) 


^^These considerations assume that the sanple of data is already 
at hand, perhaps by a survey already made, such as the U. S. Weather 
Bureau Thunderstorm Project mentioned in reference 6. Of course, if it 
is a question of planning for the securing of data, it is desirable to 
use as efficient sin estimator as possible, but in that case the inves- 
tigation will rarely be sufficiently extensive to provide sanples large 
enough for the method described in this appendix to be applicable. 



MCA TN 3053 


59 


The scale parameter p reg.uires at least two order statistics, or 
rather their difference - x^, for estimation, multiplied hy a 

suitable unhiasing factor which will become absorbed in the expression 
for b in equations (DY) • A considerable number of triads indicate 
that the pair of vadues A = O.03 and v = O.83 gives an estimate 
of p with efficiency probably close to the maximum, if not actually 
maximum. Since very precise results are not being sought, this pair 
of vadues is adopted here . Thus equation (D1) , in view of eqiia- 
tions (DY), becomes 


^ " ^0.20n °-5256^yp + oAY 59 ) (xq 3^^ - ^0.03n) 


(DlO) 


The variance of this estimator is obtained from the rule 


/ m \ 

m 


m 

l5 H 

= a^2a 2 ^ 

2 

Y aiaj cov (xi,xj) 

i,J=l 

1^ J 

\i-j- / 

X— X 



which after sinplification gives 

a2(f) = 8.69l6d2 - 0.068ld + 1.51442 (Dll) 


where 


d = 0.3256yp + 0.1549 


Since | is \mbiased, a measure of its efficiency may be obtained 
by dividing its variance into the Cram^r-Rao lower bound (see 

eq. (19) and acconpanying textj numerical values are given in the 
Q column of table Ill(a)). The results are as follows, for several 
values of P of Interest: 


P 

Efficiency of f 

0.95 

0.645 

.99 

.649 

.999 

.652 

1 (limiting 

.660 

value) 





6o 


NACA TN 3053 


Thiis, "tliis 1 ft-r gp-R aTnp lfi method of estimation is slightly less than two- 
thirds efficient. However, &s noted above, such apparently low effi- 
ciency need not he a serious matter in practice. 

For convenience, a summary of the method described above is given 

here. 


( 1 ) Arrange al 1 n obsearvations (assumed to be independent and 
from the same extreme-value distribiition) in order of increasing size, 
and then rank them from 1 to n. 

( 2 ) By banfl or mechanical sorting, select the three observations x^ 

whose ranks are the nearest integers to 0.03n, 0.20n, and 0.85n. Denote 
these hy sM 

( 3 ) Conpute the predicted values f, for various probability 
levels P, by foimrula (DIO) . 

(4) For each value of P conpute the variance from formula (Dll) . 

( 5 ) Take the sqijare root of the variance to obtain the standard 
deviation. This gives the half -width of the 68-percent confidence band, 
since for large sanples the distribution of | approaches normality. 
Similarly, twice the standard deviation determines the 95-percent confi- 
dence bsind, and 2.58 standard deviations determine the 99 -percent band. 

( 6 ) Obtain the efficiencies by dividing the variance into the 
Cramer-Rao lower bound Qg in table m(a) . 



MCA TN 5053 


61 


APPENDIX E 

ANALYSIS OF CONFIDENCE INTERVALS IN ORDER-STATISTICS 
METHOD AND METHOD OF MDMENTS OF GUMBEL 
Confidence Intei'vals in Order-Statistics Method 
(Based on Normality Assumption) 


In the text (see rule (5) in the section "Summary of Procedures") 
the confidence intervals given for various confidence levels in the 
proposed method are obtained hy laying off a certain nuniber of standard 
deviations, conputed for the estinator |p, on either side of the esti- 
mated value given hy the fitted line. If this is done for different 
values of P and the ends are joined, as in figure k, a confidenpe hand 
is obtained- The number of standard deviations given in the method - 
one for a confidence level of 68 percent, two for a level of 95 percent - 
is haised on the assunption that the estimator fp is normally distributed. 

The purpose of this section is to investigate this assumption more closely. 

It will he recalled that the estimator |p is obtained hy splitting 

the sample into a number of equal groi:ps with perhaps a remainder of dif- 
ferent^size (see text in connection with eq.s. (22), (25), anfl (26)). 

Then f'p can he written (eq.. (26)) 


f'p = tT + t’T’ 


where T is the average of a ceirtain linear function of the sample vari- 
ables (eq. (22)) taken over the k suhgroips, T’ is another linear 
function, and t and t' are constants. Thus |p is the sum of two 

parts: (l) An average of k independent random variables (ITj^)^® all 

with the same distribution and (2) a single variable (t'T’) with a some- 
what different distribution. By the central limit theorem in probability 
(ref. 11 , p. 215) , according to which the average of a number of random 
variables having the s ame distribution (with first two moments existing) 
is asymptotically nor m al as the number of variables increases indefinitely, 
the first part is approximately normal for large values of k. In fact, 

5 ®These variables are independent because the subgroups were assumed 
to be formed independently. 



62 


NACA TW 3053 


extensive experience has shown that a normal distribution is often a 
remarkably close approximation even if the number of variables k is 
under 10 . Furthermore^ the first two moments (actually all) of each 
variable certainly exist - in fact the proposed method is based 

upon their computed values . Hence, it is safe to say that for k = 10 
or more the first part is very closely normal . The second peurt (t'T') 
is a variable which has the same general character as T^ (a weighted 

sum of order statistics; see eq.. (25)) and. hence is believed not to 
impair significantly the approximate normality of fp. Its influence 

is likely to be small, especially if the number of other variables k 
is large. 

For sauries as large an 100 , k = -16 if broken into subgroups of 6, 
or k = 20 if broken into subgroups of 5 - Since these values of k are 
considerably larger than 10, the preceding discussion shows that it is 
q.uite safe to assume normality for gp for samples of 100 or more, so 

that the corresponding multiples of the standard deviation given above 
are sufficiently acciorate in such ceises. In fact, it is likely that the 
normal approximation remains good for practical purposes down to samples 
of 50 or 60, becoming, of cotirse, worse as sample size decreases still 
fiirther. However, in the absence of knowledge about the exact distri- 
bution of the order-statistics estimator fp for smaller samples, the 

normal approximation is apparently the only simple one available for 
determining confidence limits . It may be noted that approximate methods 
are also involved in determination of confidence limits in the Gumbel 
method. This point is further discvissed in the following section. 


Confidence Intervals for Largest Extremes in Gumbel Method 

Gumbel ' s derivation of confidence intervals . - The pinrpose of this 
section is to inquire into the theoretical accuracy of the confidence 
intervals (or confidence band) given for extreme predictions in Gumbel 's 
method. 

In the Gumibel method the 68-percent confidence-interval half -width 
for the largest in a sample of n extremes and for all larger predicted 

values^^ is, in Gumbel's notation -(table Vll(b), sec. IV), 


^^That is, for all values of P beyond n/(n + l) , which is the 
probability assigned to the largest value in the sample, x^. For 

smaller values of P, the confidence interval is given by a different 
method with which this report will not be concerned inasmuch as the 
pr imar y interest is in large values of P corresponding to extreme 
predictions . 


NACA W 3055 


63 


1.141 


l.l4l^; 3 = l/a 


(El) 


where 3 is the scale parameter (or rather, an estimate of it) of the 
extreme-value distribution from which the observations are assumed to 
come. To obtain the confidence interval for a given prediction prob- 
ability P ^ n/(n + 1 ), the value added to, and subtracted 

from, the estimate given by Gumbel, denoted by him by x (table VII (b), 
sec. Ill) and in this report by Gumbel 's (68-percent) confidence 

interval for predictions beyond the largest observed extreme x^ is 

thus given by 


± 1.1413 


(E2) 


where 3 is the scale parameter (or an estimate thereof) of the extreme- 
value popiolation from which the observed extremes x have been asstimed 

to come; 

F(x) = 'l'(y) = exp (-e"y), y = (x - u)/3 (E3) 

The multiplier l.l4l used for the 68-percent confidence band is obtained 
by setting C = 0.68 and solving for y the eq,-uation 


<i’(y) - ®(-y) = C 


(e4) 


which is parameter free and gives y(C) = y(0.68) = 1.14073 (ref. I 3 , 
p. 6) . Thus 


y = -1.14073 to y = 1.14073 


(E5) 


^^From the theory of extreme values the distribution of the largest 
of the observed values x^, in a sample of n extremes, is exactly an 

extreme -value distribution that has the same scale parameter 3* 



6k 


MCA TN 3055 


is the interval for the reduced variate that cuts off (or corresponds to) 
a central area of 0.68 under the extreme-value density cxnrve shown in 
figure 2 . The corresponding interval that cuts off the same area under 
the original (unreduced) x-distrihution thus has width given hy the 
values (E 5 ) multiplied hy the scale factor since 


X = u + yp 


The half -width is therefore l.llK) 75 ^^ that is, equation (ei) . 

The following discussion indicates that this method of obtaining 
confidence Intervals is inaccurate in two respects: (l) The confidence 

interval is of constant instead of increasing width for large values 
of P; (2) the scale parameter used is not strictly applicable. 

Constant width of confidence interval .- The method of Gumbel of 
obtaining confidence lnte 2 rvals (E2) treats the estimator as though 

it has an extreme-value distribution with the same scale parameter 3 
an in the population imder lying the observed extremes x^j^ (including the 

largest extreme Xjj) . This eissuTiption cannot be considered strictly 

valid, since it implies that the confidence width remains constant for 
all large values of P, as equation (El) does not involve P. In other 
words, this asserts that from a sample of 20 observations or even 100, 
for example, statements can be made about events that will occur with 
probability one in a million or billion and yet have the same uncer- 
tainty of only in the present estimate for x as for predictions 

about events with probability, say, 1 in 100 . It does not seem reasonable 
that a limited sample can tell anything at all meaningful about such 
extremely rare events, let alone predict, them with the same amioiant of 
imcertainty no matter what the probability of occurrence. 

This lack of agreement with common intuition indicates that the Gumbel 
estimator = u + ypP cannot be treated, for all large values of P, 
as if it has an extreme -value distribution with constant scale paramieter. 

Besides these considerations, there is another reason why the Gumbel 
estimator does not itself have an extreme -value distribution, at leant 
for large samples of data. The estimator is a sample characteristic of 
the form 


= X + kpS^ 


(e6) 



NACA OIN 3055 


65 


where k is a constant for given values of P. and n. !Ehe appropriate 
distribution of such an expression for large samples is given hy a general 
limit theoron in probability (ref. U, p. 367) to the effect that under 

broad conditions any sample cbarELcteristic based on moments such as 

is, for large values of n, approximately normally distributed- Thus for 
large values of n the Gumbel estimator (e 6 ) should be considered to be 
approximately normal, with variance given by an es^ression which increases 

as kp^ for P and kp large. Moreover, this would yield a confidence 

band that diverges with Increasing P, avoiding the diff ictilty of the 
parallel curves mentioned above. 

Scale parnmater . - Little is known about the exact distribution of 
the Gumbel estimator 1 ^, particularly for small sample sizes. Yet 

even if It were an extreme -valTie distribution (of the form of equa- 
tion (E 3 )), it would seem that its scale parameter woiild not be p but 
a certain multiple of it. Bp, found below. This multiple may be deter- 
mined by considering the relation between the variance of the distribu- 
tion (asstimed extreme-value) of and the scale parameter p-j^ of this 

distribution; 

" T 

But there is available an approximate expression for the left side, namely, 
equation (b 6 ) in appendix B. Hiis is of the form 


= ‘i(yp)p^ 


(e 8 ) 


where p is the scale parameter of the original (extreme -value) 
x-distribution and is s quadratic expression in the probability 

factor yp with coefficients involving the quantities ^^(s) and 

cov (y,s), whose computation by empirical sampling is indicated in 
appendix B; regarded as a known value Sp depending 

on P . Hence 



66 


NACA TN 5053 


SulDstituting in equation (ET) gives 


Pi 




(ElO) 


wMch defines the multiple Bp. Thus the confidence-interval half- 
width (E 1 ) must he replaced hy 


A' = l.lli.lBp 3 


(Ell) 


■where now A' is no longer constant with P hut, on accoiant of Bp, 
actually Increases very rapidly for letrge values of yp corresponding 

to values of P near 1 . Thus a modified confidence hand is obtained 
whose divergence states that the amount of uncertainty increases with- 
out limit as one attenpts to estimate increasingly inrprohahle events. 

This also avoids the conflict with common sense mentioned in the sec- 
tion "Constant width of confidence interval.." 

The actual values of Bp are of interest and are given in the fol- 
lowing table for several important values of P and for the three sanple 
sizes for which they were conputed in appendix B: 


p 


n = 10 

n = 20 

n = 50 

0.95 

.99 

.999 

1.095 

1.595 

0.560 

.825 

1.208 

O.k^d 

.673 

.986 


In this table the values of Bp less than 1 indicate that the 

modified confidence band (eq. (Ell)) is better (i.e., narrower) than 
the Gumbel confidence band and vice versa for the values of Bp greater 

than 1 . Thus, the modified band is indicated to be considerably better 
in the region P = 0.95 'to 0.99 for sanples of 20 and 50. For sanples 
of 10 , the advanteige is less at P = 0.95 and becomes reversed in favor 
of the original Gumbel confidence band for P = 0.99 and hl^er values. 











NACA TW 3053 


6 ? 


The above con^arison remains exactly the same for any other confi- 
dence level, it being merely necessary to replace l.ll^l in equations (E1) 
and (Ell) by the corresponding veLLue y(C) determined from equation (Elt) . 
Thus, for the 95-percent level, y(0.95) = 3-06685 (ref. 5 , lect. 3 , 
table 3-l) • At each level the confidence intervals of the two methods 
are affected in the same ratio by such multipliers; that is, their ratio 
to each other remains Bp, regardless of confidence level C. ~ 


Con^iarlson of Confidence Intervals in Gumbel Method 

* 

and Method of Order Statistics 

Table IX shows the actual confidence Intervals (in terms of the 
scale parameter 3) for the two levels C = 0.68 aufl C = O.95 for 
the Gumbel method and as modified by the factor Bp anfl also con^ares 

these (where applicable) with the Intervals given by the order-statistics 
method. Except for sacqjles of 10, for which the Gumbel interval is apt 
to be narrower, the modification denoted by Bp, discussed in the pre- 
vious section, reduces the interval width for P = 0.99 (and less) by 
significant amounts - by about one-sixth or more for s amp l ea of 20 (col- 
umns 5 6) and by about one-third or more for samples of 30 (columns 8 

and 9) • These results are of course implied by the values of Bp given 

in the preceding section. Also, the order-statistics confidence interval 
is narrower than the (unmodified) Gumbel interval in many cases, for P 
not beyond 0.99 and sample size not below 20. However, it increases 
beyond the constant Gumbel width for larger probabilities, in agreement 
with theoretical requirements. At P = 0.99 or less, there are two 
additional features to be noted, (l) With increasing confidence level, 
the numerical, factor in the Gturibel interval Increases faster in 

either the modified interval A' or in the order-statistics interval 
(denoted by ^ in table IX) , so that both the modified method and the 

order-statistics method reduce the confidence interval of the Gumbel 
method by constantly increasing percentages as the confidence level 
increases. For example, for P = 0.99 and for samples of 20 the order- 
statistics Interval is about 11 percent narrower than the Gumbel inter- 
val for a confidence level of 68 percent and about 30 percent narrower 
for a level of 95 percent (columns 5 and 7)- (2) Similarly, the per- 

centage reduction Increases with" sample size. Thus, for P = 0.99 »nrl 
a confidence level of 68 percent, the reductions are 11 percent for 
samples of 20 and 29 percent for samples of 30 (colimnns 8 Rnri lo) . 



68 


NACA TW 3053 


REFERENCE 


1. KlmbELll, Bradford F.: Sufficient Statistical Estimation Functions 

for the Parameters of the Distribution of Maximum Values. Ann. 

Math. Statistics, vol. IJ, no. 3 7 S^t. 19^} PP* 299-309- 

2. Kimball, Bradford F.: An Approximation to the Sanpling Variance of 

an Estimated Maximum Value of a Given Frequency Based on Fit of 
DoTibly Exponential Distribution of Maximum Values. Ann. Math. 
Statistics, vol. 20, no. 1, Mar. 19h9, pp. UO-II 3 . 

5 . G-umbel> E. J.: Les Valeurs extremes des distributions statistiques . 

Ann. Inst. Henri Poincare, t. 5^ P't* 2, 1935^ PP- 115-158- 

4. Gumbel, E. J.: The Ret\im Period of Flood Flows. Ann. Math. 

Statistics, vol. 12, no. 2, June 19^1, pp. 163 -I 9 O. 

5 . Gumbel, Emil J.: The Statistical Theory of Extreme Values and Some 

Practical Applications. (To he published by the Hat. Bur. Standards.) 

6 . Press, Harry; The Application of the Statistical Theory of Extreme 

Values to Gust-Load Problems . HACA Rep. 991, 1950. (Supersedes 
HACA TN 1926 .) 

7 . National Bureau of Standards; Probability Tables for the Analysis of 

Extreme-Value Data. Appl. Math. Ser. 22, July 6 , 1955. 

8 . Kendall, Maurice G.; The Advanced Theosy of Statistics. Vols. I 

arid H. Charles Griffin and Co., Ltd. (London), 1948. 

9 . Wilks, S. S.; Order Statistics. Bull. Am. Math. Soc., vol. 54, 

Jan. 1948 , pp. 6 - 50 . 

10. Smith, J. H., and Jones, H. L.; The Weighed Mean of Random Observa- 

tions Arranged in Order of Size. (Unpublished paper.) 

11. Cramer, Harald; Mathematical Methods of Statistics. Princeton Uhiv. 

Press (Princeton), 1946. 

12. Lehman, E. H.; Notes on the Theory of Estimation. Lect. Notes, Univ. 

of Calif., 1949 - 50 , Ch. II, pp. 15-19- (Uupiblished mimeographed 
notes . ) 

13 . Gimbel, E. J.; The Statistical Forecast of Floods. The Graduate 

Facuaty of Political and Social Science, The New School for Social 
Research (New York), Dec. 1948. (Published by Ohio Water Resources 
Board, Columbus, Ohio.) 



MCA OT 3053 


69 


14. Anon.: EAraluatlon of Climatic Extremes. Rep. No. 175 ^ Environmental 

Protection Sectionj Office Quartermaster General, Dept. Army, 

Mar. 1951. 

15. Johnson, N. L.: Estimators of the Prohahillty of the Zero Class in 

Poisson ari(1 Certain Related Populations. Ann. Math. Statistics, 
vol. 22, no. 1, Mar. 1951# PP* 94-101. 

16. Lloyd, E. H. : Least-Squares Estimation of Location and Scale Param- 

eters Using Order Statistics. Biometrika, vol. 59, pts. 1 and 2, 

Apr. 1952, pp. 87-95. 

17. National Applied Mathematics Laboratories: ITahle of the First Moment 

of Ranked Extremes. Project S50-39, NACA and Nat. Bur. Standards, 
Sept. 20, 1951. 

18. LLeblein, Julius: On the Exact Evaluation of the Variances and Covar- 

iances of Order Statistics in Samples From the Extreme-Value Distrib- 
ution. Ann. Math. Statistics, vol. 24, No. 2, June 1953, PP. 282-287. 

19. Newman, F. W.: The Higher Trigonometry; Sipeiratlonals of Second 

Order. Macmillan and Bowes (Cambridge), I892, pp. 64-65. 

20. Mosteller, F.: On Some Useful "Inefficient" Statistics. Ann. Math. 

Statistics, vol. 17, no. 4, Dec. 1946, pp. 577-4o8. 

21. Gumbel, Emil J.: On Serial Numbers. Ann. Math. Statistics, vol. l4, 

no. 2, June 1943, PP- 165-178- 



TO 


NACA 0 !N 3053 


WORK S HBCT l._ DHEERMEnAUDH OF ESTIMUORS 


[Por Instructions see the section "Sunsaary of Procedures'^ 


Sorirce: HAXZA. - Langley — Sample ttt 


Con^uter; J. L. 


Date: ^/29/^g 


Record 


1 

2 

3 

k 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 
23 

Sus 


Obsenred 

extremes. 


o.75e 

.90 

1.08 

1.20 

1.38 

.a. 

.80 

.75 

.90 

1.20 

.88 

1.08 

1.15 

1.00 

1.31 

1.43 

.98 

1.0a 

1.01 

.93 

1.15 

•75 

1.16 

23.6a 


I. Subgroi^ sizes and proportionality factors: 

n-^-km+m' tn km/n o 0.76261 

-5x6 + 5 t^/k - 0.201)^16 

k — 5 Q — ^ m* — 5 


t’ » m'/n ° 0.21739 

(t’)^ “ 0.Qit726 


JIA. l-Iadn subgroup: 

If- 1 

- 0.35545 

hi, - -0.45928 


Weights and hi^ (from table l) 

23456 
0.22549 0.16562 0.12103 0.0^52 0.0468? 

-0.<5599 0.07319 0.12673 0.149^ 0.14581 


Chech sum 
1 

- 0.00001 ' 


Observations In Increasing order from 1 » 1 to 1 


Subgroup 

1^1 

1=2 



==5 


Chech 

sum 



1 

0.75 

o.ai 

0.90 

1.08 

1.20 

1.38 

6.12 

0.89669 

0.20978 

2 

.75 

.80 

.88 

.90 

1.08 

1.20 

5.61 

■85052 

.14168 

3 

k 

5 

k 

.98 

1.00 

1.02 

1.15 

1.31 

1.43 

6.89 

1.06127 

.13870 

Sun 

2.W 

2.61 

2.80 

5.15 

5.59 

4.01 

18.62 

2.80846 

0.49016 


T = ” 0-95616 + 0.16339 yp 


TTR . Remainder subgrox^p; 


Weights a^* and b^^* (frcan table l) 


1 - 

1 

2 

5 

4 

5 6 

Chech sum 

“1' - 

O.M893 

0.21:620 

0.16761 

0.10882 

0.05859 

0.99999 “ 1 

ti'- 

-0.50313 

0.0065!: 

0.1301:5 

0.18166 

0. 181:1:8 

0 

II 

0 


ObservatlozB 3^* in Increasing order freon 1 ■* 1 to 1 » m' 

0.90555 0.l85l(0 

T- - + ^T’i'='i')yp - 0-90555 + 0.1^ yp 


1^' 

*2’ 

^5* 



, Check 
sum 

0.75 

0.93 

1.01 

1.15 

1.16 

3.00 


TTT. Estimators: 


Sp - tr + fT' - 0.9291:6 + 0.1677!: yp 
u - 0.9291:6. B - O.I677I: 




WORK aarar e.- PREDOTH) VALUSSj COHFIDHfCB EAHD, ESFICIHfCI, AHD PIOTEDIO posmote 


(a.) Predicted values, confldrmce 1)01111, aod olflclancy 



(D 

0 

0 

0 

® 


0 

0 

p 

yp 

Predicted value® 
tp - 0 . 929*16 + 0.l67?4yp 

«6 


vr (tp) = 

68-p«ro«iit oonfidenca 
tana balf-vldth. 

(«o 

table m) 

Sfficlency, 
, _ ^ 

— - ■ 1 

(from table i,r 1 !) 

(from taibls m) 

• 

<y(tp) -\jw(lp) 

var (tp) 




t^A - 0.20416 

tt')2 . 0.04726 


3 - 0.l6774g 



0 . 367 S 8 

0 

‘^.929<i6e 

0.19117P® 

0.23i4op® 

0.04997P® 

0.05756 

0.040203® 

0.965 

.50 

.36651 

• 9909*ib 

. 23169^2 

. 278700 ^ 

. 0605 IP® 

.04136 

. 059943 ^ 

.991 

.90 

2.25037 

1.30694g 

1 . 00065^2 

1 . 22651 ?^ 

. 26234 ?^ 

•0859b 

. 232353 ® 

.885 

.95 

2.97020 

l.4276fite 

1 . 54171 $^ 

1.90549P^ 

.4047ip® 

.10678 

• 547773 ® 

.659 

.99 

4.60016 

1. 701098 

3.27250P® 

4.07D62P^ 

.660453^ 

.1556g 

. 710533 ® 

.826 

.999 

6.90726 

2.060068 

6.92044P® 

S.63l73p2 

1 . 621760 ^ 

.2264g 

1 . 463643 ® 

.805 

1 


^(0,1677*+b) 

.13196yp2f|2 

.I6665yp2p2 

.03462yp^p® 


•02645yp®3® 

.759 





("b) Plotting posltionj 


-L ue u 


‘=^J 


Rank, 

T 

Obearvad 

extrena 

Plotting poeltion, 
r/(n + 1) 

Bank, 

r 

Obaarvad 

oxtruu 

1 

0.736 

0.0417 

u 

i.ooe 

2 

•T5e 

. 0^3 

12 

1.016 

? 


.1230 


1.026 

4 

lOOg 

.1667 

l 4 

1.0«b 

5 

.818 

.20® 

15 

i.oSe 

6 

. 88 b 

.2500 

16 

1.158 

I 

• 90 s 

.2917 

17 

1 . 15 b 

8 

-908 

.3535 

18 

1.166 

9 

• 9 ^ 

.3730 

19 

1 . 20 s 

10 

• 98 a 

.4167 

20 

1 . 20 s 


®letlaate of paraaator u. 

of parameter p. 


Plotting position, 
r/(n + 1) 

Rosk^ 

r 

Obaarvad 

axtrama 

Plotting position, 
r/(n + 1) 

0.4503 

21 

1 . 3 l£ 

O.&730 

.5000 

22 

i.36e 

.9167 

.3417 

■3^3 

23 

i.‘*3B 

■ 9505 

.6230 

Sum 

23.62 


.6667 




.70® 




.7500 




.7917 

.fe33 









MCA !M 3053 











72 


NACA TN 3053 


L = I. 


AHD 


TABLE I 

WEIGHTS -FOR MINIMJM-VARIAMCE, BBBIASED, LINEAR ORDER-STATISTICS KTIMATOR 
= ^ OF PERCENTAGE POINTS gp = u + Py^, 

VARIANCE var (gp) = = (Aj^yp^ + B^^yp + FOR SAMPLE 

SIZE n = 2 TO 6 AND ^ < . . . ^ 


n 

■ 


Weights, 8Lj_ + 

i>iyp^ of 




■ 



"^3 




2 


0.91637 

0.08363 







-0.72135 

0.72135 







(o.71l86y^ 

p 

/ - 0.128643 

rp + o'. 659: 

>5)p^ 



3 


0.65632 

0.25571 

0.08797 






-0.63054 

0.25582 

0.37473 


• 



% 

(o.54472yj 

,2 + 0.04954-3 

p + 0.402i 

J6)p2 





0.51100 

0.26394 

0.15568 

0.07138 




.’=1 

-0.55862 

0.08590 

0.22392 

0.24880 





(o.22528yj 

3^ + 0.069383 

^p + 0.29346) p2 



5 


0.41895 

0.24628 

0.16761 

0.10882 

0.05855 




-0.50315 

0.00653 

0.13045 

0.18166 

0.18448 



% 

^0 . i6665y] 

+ o-.o6798y 

■p + 0.2514 

0)p2 



6 


0.355^5 

0.22549 

0.16562 

0.12105 

0.08552 

0.04887 



-0.45928 

-0.03599 

0.07319 

0.12673 

0.14955 

0.14581 


^6 

(0.13196yj 

1 

2 + o.o6275y 
1 

+ o.i 9 n 
1 

■7)p^ 
















s 


OMLE n 

HEAK3, VARIABfCB3, AHD roVARIAI(CB3 OF ORDER STAI33EICS IS aAMPi;EB OP n PE»M 
REDUCED EXTRSJffi-VALIIB DCKIIIIBUTION P(y) = «xp (-e-y) FOR n - 2 TO 6 

ARD 7i g yg ^ ^ 


n 

1 

Means, 


Varlanoefl and oovaxlanoes, a c 

'J1 



J = 1 

J - 2 

i » 3 

J 0 4 

i - 3 

J - 6 

2 

1 

- 0 . 115^152 

0.68402804 

0.48045301 






2 

1.27036285 


1 . 64493 ^ 





5 

1 

-0.40361559 

0.44849796 

0.30137144 

0.24375810 





2 

.4591^3263 


.65^235 

.54629438 





5 

1.67582795 



1.64493407 




k 

1 

-0.57351263 

0.34402417 

0.22455344 

0.17903454 

0.15388918 




2 

.10608352 


.41353113 

.33720966 

.29271188 




? 

.81278175 



.65160236 

.57432356 




4 

1.96351003 




1.64495407 



5 

1 

-0.69016715 

0.28486447 

0.18202536 

0.14356737 

0.12257865 

0.10901329 



2 

-,10689454 


.50849748 

.24676731 

.21226644 

.189675^ 



3 

.42555061 



.403981^ 

.35267072 

.64907319 

.31716095 



k 

3 

1.07093582 

2.18663358 




.58W1519 
1 . 64495407 


6 

1 

-0.77729368 

0.2463820 

0.1549674 

. 24 ^^ 

0.1212161 

0.1029164 

0.0911619 

0.0828542 


2 

-.25453448 


.1967062 

.1680628 

.1494532 

.1361910 


? 

.1883853^ 



.2976159 

.2561660 

.2288790 

. 2092 ^ 

,3520431 


k 

.662^588 




.4018552 

.3614333 


5 

1.27504579 





.64769^ 

.5998567 


6 

2.36897515 





1.6449341 




HE VOVH 




'EAELE III 


4 ^ 

VARIANCES AND EEPICIMCIES OF MIHII4JM-VAEIANCE, UNBIASED, LINEAE, ORDER -STATISTICS 
ESTBRTOR L - Ip OF ^HE PARAMETER FOR SELECTED PROBABILITY LEVELS P 

AND FOE SAWLE SIZE n = 2 TO 6 



taBle ^ 


(a) Variances (-units of p2) 


p 

yp 



% 


% 


0.36768 

.40 

.50 

.60 

rrrx 

.80 

.90 

.95 

.975 

.99 

.999 

1 

0 

.08742 

.36651 

.67173 

1 r\trr\riT 

j-.uyv:?;; 

1.49994 

2.25057 

2.97020 

3.67625 

4.60016 

6.90726 

(00) 

^1.10866 

I. 15&5 

1^37873 

1.72827 

rro 

4: ic. 

5.24743 

5.34410 

7.99666 

II . 21444 - 
16.53798 
53.66365 

'^. 60793 yp^ 

^0.65955 

.65374 

.70802 

.89454 

T r\ Qrr rr rs 

2.06814 

5.97502 

6.55752 

9.80724 

15.13171 

55.735^ 

^.jnd6y/ 

^ 0.40286 

.409^ 

.46752 

.39168 

1.25271 

2.26002 

3.59108 

5.24369 

7 . 925^6 

17.19133 

‘^.544713^2 

^ 0.29546 

•30325 

.34915 

.44172 

K /~L 

.90437 

1.59046 

2.48700 

3.59317 

5.37994 

ii.5^9 

*^. 22528 yp 2 

^0.23140 
.23861 
. .27870 

.55225 

1i ry Ctcr 

.70829 

1.22831 

1.90349 

2.73352 

4.07062 

8.65174 

^.l 6665 y-p 2 

‘^.19117 

.19766 

.25189 

.29286 

1 t 

.58218 

1.00065 

1.54171 
2.20527 
5 . 27250 
6.92044 

°. 15196 yp 2 


Cramer -Rao lower 1)0111111 is 
and. n Is saiqile size. 


0 __ 

"liB 


0 In. -where 
-o/ ' 


■o 


(o.6o7q^v_ 2 + o,sii40i)-v_ + i.ioe66')0^ 


in 


t>These give the variances of the order-statistics estimator of the parameter u. 

'^The -variances for P = 1 are n,1 1 infinite . Ej^iressing them hy meanB of the dcanlnant 

permits finding their ratios to obtain the efficiencies . Also, -the coefficients 
are the -variances for the order-statistics estima-tor of the parameter 


-Germ in vic- 
2 
yp 


KACA TN 5053 




TABLE III. - Concluded 

VARIANCES AND EFFICIENCIES OF MENIMJM-VARIANCEj UNBIASED, LINEAR, ORDER-STATISTICS 
ESTIMATOR L = Ip OF THE PARAMETER |p FOR SELECTED PROBABILITY LEVELS P 

AND FOR SAMPLE SIZE n = 2 TO 6 


(l) Efficiencies 


p 

Yp 

^ - 1 


s - 1 % l % 

E5 = 1 Q0/Q5 


0.36788 

0 

^ 0.8405 

^ 0.9173 

^ 0 . 944-5 

^.9582 

®o .9666 

.i^o 

.08742 

.8859 

. 9 h 2 ± 

.9612 

.^08 

.9766 

■50 

.56651 

.9737 

.9834 

.9872 

.9894- 

.9909 

,60 

.67175 

.9662 

• 9737 

.9781 

.9815 

.9^6 

.70 

1.05093 

.8900 

.9284 

.9450 

.9548 

.9615 

.80 

1.49994 

■ 7851 

■ S 611.1 

. PQ'vt 

. 01 7 n 

. 9 P 97 

.90 

2.25057 

.6722 

.7^ 

.8400 

.8762 


■ 95 

2.97020 

.6099 

.7425 

. 8 o 4 o 

.8404 

.8647 

.975 

3.67625 

.5717 

.7129 

.7803 

.0205 

.8475 

.99 

4.60016 

.5599 

.6872 

.7592 

.8027 

.8521 

.QQQ 
’ ^ ^ ^ 

6 . Qnvo6 

kOOn 

• 1 

1 

* 1 

*11^ 

At r \’7 

• \ 

1 

(“) 

^.4270 

^.5879 

^.6746 

^.7296 

^.7678 


& HTL. 4 ^ 4 AS 4-Vin r<T»i^ciT»-a+D + '1 fl*M r»fl ^g' Mmfl +.nT* n-P -hhp T^T* Am ft+.PT* . 

I II lib K ^xvcr ULLC Ci i V4J. uxxv- w* *-*w* - w wwv v— JT 

limiting efficiency as P approaches 1 . These values are also the efficiencies for 
the estimator of the parameter p. 


\J\ 


NACA TN 3055 




76 


NACA TN 5055 


TABLE IV 

EEFICIENCr OF ORDER-STATISTICS ESTIMATORS FOR VARIOUS SAMPLE 
SIZES n = km + m’ PARTITIONED INTO SUBGROUPS AS 
IHDICATEEl FOR P = 0.99 AND P = 1 


n 


2 or k X 2 

3 or k X 3 

4 or k X 4 

5 or k X 5 

6 or k X 6 



11 

12 

15 

ci4 



P 


15 5x5 


16 

2x6+4 

17 

2x6+5 

18 

3 x 6 

19 

3x5+4 

20 

4x5 


Efficiency, 



Efficiency, 

percent 

n 

km + m* 

percent 

= 0.99 

EBI 



P = 0.99 

IB 

54.0 

42.7 

21 

5x6+3 

80.8 

75-6 

68.7 

58.8 

22 

3 x 6+4 

81.8 

74.9 

75*9 

67-5 

25 

3 x 6+5 

82.6 

75.9 

80.3 

75-0 

24 

4x6 

83.2 

76.8 

83.2 

76.8 

25 

5x5 

80.3 

75-0 

70-5 

60.7 

26 

4x6+2 

79-9 

72.5 

75-5 

63.8 

27 

4x6+3 

81.3 

74.5 

77-7 

69.7 

28 

4x6+4 

81.9 

75.5 

80.3 

75-0 

29 

4x6+5 

82.7 

76.1 

50 

5 x 6 

85.2 

76.8 

81.9 

75-0 

51 

5 x 5+6 

80.8 

75.7 

85.2 

76.8 

52 

5x6+2 

80.5 

75.1 

77.5 

69.1 

55 

5 x 6 + 3 

81.6 

7i^.7 

74.7 

66.7 

54 

5 x 6 + 4 

82.3 

75.6 

80.3 

75-0 

d^5 

7x5 

80.3 

75.0 

81.3 

74.2 

56 

6x6 

^.2 

76.8 

82.3 

75-6 

57 

7x5+2 

78.2 

70.5 

83-2 

76.8 

58 

6x6+2 

80.9 

75.7 

79.5 

71.7 

59 

6x6+3 

81.9 

75.0 

80.3 

75.0 

®4o 

8x5 

80.3 

75.0 



61 

11 X 5 + 6 

80.6 

75.5 


^•If partition is 7 = 1 x 4 + 3, then efficiencies are percent 

for P = 0.99 and 63.4 percent for P = 1. 

^If partition is 8 = 2x4, then efficiencies are 75-9 percent 
for P = 0.99 and 67.5 percent for P = 1. 

"^If partition is 14 = 2 x 5 + 4, then efficiencies are 79*1 percent 
for P = 0.99 and 71-3 percent for P = 1. 

^If partition is 35 = 5 X 6 + 5 , then efficiencies are 82.8 percent 
for P = 0.99 and 76.2 percent for P = 1. 

®If peirtition is 40 = 6 x 6 + 4, then efficiencies are 82.4 percent 
for P = 0.99 and 75.7 percent for P = 1. 





















ama v 


BIAS®, MEAH BQUABE ERRORS, AHD RELATIVE BmOIHIOIES OF PROPOSED aHDEH-BTATISTICS ESTIMATOR fep 
TO ODMBHL ESTIMATOR BASED OH BMPIRiaAL SAMPLINO RESULTS OBTAINED FROM H SAMPLES, 

FOR P = 0.95 AHD sample 8123 n - 10, £0, AHD 30 

[valuefl of R aboro In fig. ^ 


Set 

(100 aaimlea 

each) 

Bias, units of p 
(average value for eaob 
set of 100 saiqiles) 

Hsan square error (tOE), 
units of 

(average value for eaoh 
set of 100 saxples) 

Relative efficiency, 

R - 

n in ID, 

H - 1,200 

n B 20, 

H = 600 

n 0 

H = 400 

n ■ 10, 
H B 1,200 

n a 20, 

H B 600 

5^1 

1 D 

n m 10, 

H = 1,200, anil 

B 0 . 95 i 74 p 2 

a n 20, 

N ■ 600, and 
Q B 0 . 4 T 587 P^ 

n = 30, 

N ■ 400 , and 
Q = 0 . 30 ^ 4 p 2 


© 


® 

© 

© 


® 

© 


1 

-0.25961 

-0.08075 

-0.03771 

0.93875 

0.56707 

0.32762 

0.986 

• 1.192 

1.062 

2 

-.26375 

-.034^ 

-.O6O69 

1.13608 

.39516 

.25458 

1.194 


.826 

5 

-.16007 

-.17221 

-.06280 

.85747 

.52061 

.39850 

.901 

1.0^ 

1.292 

4 

-.34090 

-.05595 

-.09250 

.80^ 

.59028 

.41688 

.847 

1.240 

1.352 

5 

-.16116 

-.20845 


.81749 

.58173 


.839 

1.222 


6 

-.13951 

-.115^ 


I.I3&8 

.52853 


1.196 

i.i.n. 


7 

-.10497 



.88266 



.927 



a 

-.15613 



.92989 



.977 



9 

-.21475 



.67389 



.918 



10 

-.18451 



1.22356 



1.286 



11 

-.29129 



1 . 004 k> 



1.036 



12 

-.32256 



1.02068 



1.072 



Average 

-.21827 ■ 

-.11464 

-.06842 

.96914 

.53056 

.34940 

1.018 

1.115 

1.133 

Proportion of aets favorable to proposed estimator 

(R > 1) • 

5 out of 12 

3 out of 6 

3 out of 4 




°7ar explanation of Q, see equation (B8) In appendix B and aoooB5)anying ddacnsslon. 


NACA TOT 5055 




78 


KACA TN 5053 


TABI^: VI 

BIAS AND EFFICIENCY CHAEACmilSTICS OF ORIGINAL GDMBEL ESTIMATOR Iq AND 
SIMPLIFIED (ASYMPTOTIC) FORM 1^, FOR SAMPLE SIZE n = 10, 20, AND 30 
AND n INFINITE, FOR P = 0.95^ 0-99, AND 1 


p 

Bias, 

units of 3 

Variance, 
units of 3^ 

Mean square 
error (vBE) , 
units of 3^ 

Relative efficiency 
of order-statistics 
estimator to Gumbel 
estimator, R = MSE/Q®" 


^G,n 

^G 

yN 

^G,n 

^G 

^G,n 

^G 

^G,n 

^G 



(D 


(D 

© 






n = 10 

(conpufced from en^jirical sanpling restilts) 


0.95 

0.64 

-0.22 

1.48 

0.92 

1.89 

0.97 

1.99 

1.02 

.99 

1.02 

-.37 

3-32 

1.97 

4.36 

2.10 

2.14 

1.03 

“^l-oo 

•23yp 

-•09yp 

•I5yp^ 

.08yp2 

•20yp2 

•09yp^ 

2.38 

1.06 


n = 20 


(co^^)^rfced from eir5)iricaJL sampling resxG.ts) 


0.95 0.42 
.99 .66 

^1.00 .I5yp 



0.52 

0.87 

0.53 

1.85 

1.11 

1.12 

1.99 

1.16 

1.96 

i.l 4 

• 05 yp 2 

• 09 yp 2 

• 05 yp^ 

2.18 

1.19 


n = 30 (coDjaited from en^jirical sampling resxilts) 


0.95 

0.33 

-0.07 

0.43 

0.34 

0.54 

0.35 

1.76 

1.15 

.99 

.53 

-.12 

.96 

.75 

1.24 

.76 

1.89 

1.16 

"^i.oo 

•12yp 

PLi 

fOv 

0 

1 

1 

ro 

•03yp2 

« 

0 

ro 

■ 03 j / 

2.12 

1.21 

n infinite (con5»uted from theory) 

0.95 

0 

0 

0 

0 

0 

0 

1.237 

1.237 

^ -99 

0 

0 

0 

0 

0 

0 

1.290 

1.290 

*^1.00 

0 

0 

0 

0 

0 

0 

1.389 

1.389 



®For values of Q, see table V, Leadings for columns 8, 9 , and 10. 

^For P = 1, all quantities except relative efficiency are infinite, 
for finite s amp le size. Expressing them in terms of yp (wMcb is also 

infinite) permits con5)arison for values of P very near to 1. 






















































NACA TN 3055 


79 


TABLE VII 

PROBABILITIIS OF EXTREMES 
(a) Plotting positions 


Extremes Freauencv 

Mean and Standard Deviation 

Cumulative' Freauencv Plottina Positions 

X P 

x-p X* • p 


m 

m/(N+l) 

, fs- 


1 



. 75- 


2 


O.0%33 ^ 

. 7S- 


9 


0./3SO 

• ffo 


4 


O./0&7,„ 

-S-! 


s- 


0.20g3 

.fg 


£, 


O.SS'OO ^ 

.90 


7 


0.29 1C, ^ 

.90 


f 


0.3333 ^ 

.93 


? 


0.37S0 ^ 

.9^ 


/o 


0.47G 7 

J. 00 


K 


0.'4s83f„ 

i.ai 


12 


OrSOOO 

t.oz 


13 


o.s4io 

j.ag 


j4 


O.S833 

hot 


/S 


0.62^0 to 

!, /$■ . 




_ ■ 0.&C61 

h/s- _ 


n 


0.7083 

!./0 


18 


0.7S00 

i.to 


!9 


0.79/7 

!.20 


zo 


0.8333 

!.3l 




0.87SO 

h3S 


3Z 


0.9 /C.7 ,0 

1.43 


33 


0.9S83 






















Z34.2 :?SU3 6, 

Arbitrary Meon: 

0 

ComDuter: ^ 7^ 

; ^(x*p) 

rinfn Sf/DJSS. nnta: 

.MCA^ 

Lanqls 

y - T/n 




80 


MCA 5 CN 3053 


TABLE Vn - Concltiaed 
BROBABILlTihS OF EXTREKES 

(Tj) Mean anfl standard deviation^ parameters, line of 
expected, extremes, and confidence "band 


S ta 


II 

la 

ft 

ll 

5 C 5 

It 

el 

§t 

lb 

|i 


!■% 

LI 

c 


I- 


0 * 70 «ICO 

l.laB.l 4 ai 


I. Meon ond Stondard Deviation: 

N - <23 r(xp). 

VFT ■ 3 Moon: x' ■ i 70 

Arbitrary Mean: x„ - 
True Mean T - 
n/(N-1) • 

II. PorametersS 

CT- « ^ y // 



SS. {32C> 


Agyj 7 


os -47 

(..X*- 9rm o 

Standard Deviation; s, • 




l/a* s./cr, • O' ^ 


^/^pVU'i•(^/a)/Vfl- — Qr<>d 26 - 


Jm 

T. (l/o) 

u-x* 5 yi/o) 


a-'SafB 


nL Line of Expected Extremes: 

x-uJ(l/ 0 /y* O' 9317 

y; -2 jOO 0.00 



6. 09^3 


O' 93/7 

• mode 


NOTE: Uppir sign used for moxlmo, 
lover sign for rnlnlmo. 


o,/to4 


y{l/0): 

.S70<i 


0.00 


sno 


500 

9oao 


■ u. .9317 J'¥-7X9 


/,g337 


. aes 
. 4 os 9 


— y 

4£0 

y;?y» 


NOTE: Voluu X |4 ond rtttim porloda of 10 ond 100 ■ 


Half-width of 0.68269 Confidence Bond, ■ 

ct,^VTT/(oVn) • 

«T,:,Vn) [((AX/Vn]: 


$(x) : .ISO 

.200 

.300 

.400 

.500 

.600 

.700 .800 

.860 

1268 

1.243 

1.266 

1.337 

1.443 

1.698 

1.838 . 8241 

asss 

0-. _ . . 04-7Z 

• o4e,7 

.0477 

'OS03 

.os 43 

.p£»! 

,OL9P '0S43 

.097Z 

For laryest value. 

A„.|.I4I (1AI)« 




o.zost 



For next- to - largest value, » .759 (lAi) ■ 


0'J309 


3L Expected Extreme;ln T periods (years, etc.): x,- XneZ,(Xn>-Xn) : Xwo-Xn 


T Zt Zt(X 0 ^Xq) Xj 

iR .mn 

T Zr Zt(x«-xJ %r 

fin 7AI 

T Zt ZT(XmrX|,) Xj 

I 4 n M 44 


Tn RtfT 

IRn 1 1 TR 


fin OHR . 

pnn iPfiR 


qn «RR 

1 djfin 


inn innn 

cnn 1 RK> 


iin rn^i 

ROO IfiPT 


ion 1 rtTP 

760 L869 


ixn 1119 

innn i non 




nntn • A 7 ACA — Lonale 

V ~ .^atnif/c 


Jf-, J 


Comouter: B7tr 

rintn- S"// 3 -/s 3 . 



TABLE V TTT 




EMPIRICAL SAMPLIHG VALUB3 OP FIRST ABB 3EC0HD MDMHTS OF SAMPLE MEAM y 
AKD STABDAED EEVIATIOH B FOR SAMPLES OF n = 10, 20, AHI) 50 
COMPARED WITH CQRRESPONDIHa THEORETICAL 
VALUE WHERE OBTAINABLE 



03 


NACA W 5055 




■LH ni iTi J_A. 


UU 


ro 


COMPARISOH OF CONFIIilSHOI-iamVAL HALF-HIMHB FOR KEHfflMS PRBDICflOHa OIVHI BY OOMBEL METHOD, 
BY MDDIPIGD IffiTH)®, AHD BI OHDIR-SIiVinBTICa MEEHDD, FOR SAMTiES OF n = 10, 20, AMD JO 
AMD FOR COHraiEHCE LEVEIfl OF 6fl PHIOZHT AND 95 PEROENT 


p 

n - 10 

n - 20 

1 .1 . 1 1 ■ 

n «< 30 

(hoibel 

method 

Modified 

matbod 

(a) 

Order- 

statistics 

Bwtbod 

(t) (c) 

Oumbel 

method 

Modified 

Method 

(a) 

Order- 

statistics 

netbod 

(B) (c) 

Ouidbel 

method 

Modified 

method 

(a) 

Order- 

statistics 

method 

(B) (c) 

0 ) 

© 



© 




© 

® 

0 

68 -porcent confidence level 

0.95 

• 99 

• 999 

A = l.l 4 lp 

A' = l.l 4 lBpP 


A - l.l 4 ip 

A' « l.l 4 lBp 3 


A r, 1.1413'^ 

A' - l.l 4 lBp 3 '! 

4 ,- 

l.l 4 ip 

l.llfip 

l.l 4 lp 

0 .^ 5 P 

1 . 247 P 

1 . 817 P 

O.976B 

1 . 427 P 

2 .O 0 OP 

l.l 4 ip 

l.l 4 lp 

l,l 4 lp 

0.6393 

.9413 

1.3063 

0.6903 

1.0093 

1.4713 

- 1 .i 413 

1.1413 

0.7683 

1.1243 

0.5553 

.8093 

1.1763 

5 

95 -percent confldenca level , ' 

0.95 

• 99 
•999 

A 0 J. 067 P 

A’ 0 5 . 067 Bp^ 


A - 3.0673 

A' » 3.067^3 

^ “ 2 ^ 

A_- 3.0673'! 

A' - 3 . 067 Bp 3 '! 

Ad - 2 ^^ 

5.O67P 

J. 067 P 

?.o 67 P 

2 . 29 TP 

5 . 355 P 

4 . 8 ad 

1 -. 970 P 

2 . 899 P 

4 . 245 P 

3.0673 

3.0673 

3.0673 

1.7173 

2.5503 

3.7073 

1.4543 

2.1513 

3.1593 

3.0673 

3.0673 

2,0653 

3.0223 

1.1813 

1.7423 

2.5543 


®Valuea of Bp 




are Based on eir^plxloal saiqpLliig anetBods. 


See appendix B, 



^or orplanaBlon of q.uantl'fcy of fom. (l/k)0m appesrlns In saa ag.ua'blon (B8) in appendix B wtyI accoaipani^liig dlscuBBlon. 

^Based on aasunjitlon of nnnnality for order-atartlatloo estimator. For discussion, see appendix E. 

“Applies onOj f or P S n/(n + l) - 0.909, 0-952, and O .968 for n = 10, 20, and JO, respectively. See footnote J1 
(appendix E) . • 


MCA TN 5055 




Density function, f (x) 













Efficiency, percent 



Probabilitv. P 

j I 

Figure Con^ orison of efficiencies of order-statistics estimator 

for san^jles of sizes 2 , 5 , or for sanjjles of any size 

if broken into eq.ua! subgroups of 2 to 6 . (Data from table Ill(b) .) 


i 


MCA TN 3053 


86 


MCA TN 3055 


^•Rtduced Vonott 
-r*^99 8 - 


^ S“ 


J A 
: <X. 


2 I 
s « 


* < 
>. *• 

II 

5 


s § 


E •£ 

2 “ 5 


y K 6 


-. 0001 - 


Rtfuro Pariod"T 
-pr 9000 



0bt«rv«4 Vorioie 


Figure 4.- GrapMcal analysis of a saaiple of 25 Tna-irtTmiTn acceleration 
increments ty method of order statistics. (Data from work sheet 2.) 



D9*4lop^ from Dr £. / 6 ymttf*s SntironntMfof Profecfion Section, Reseorcti and Development Brooch 

Etirtme ProbatMtU/ Popf, by fbt dimotoloffy Uott Military Planning Division, Office of The Quartermaster General. 


KACA TN 3053 


87 


aRtduMd Vonot* 
— 1-5998 ■ 


Ralum PtriodaT 
5000 



-. 0001 - 


Observed Vorioio ■ X 


Figure 5 .- Con^jarison of order-statistics and Gunibel methods of analyzing 
a sample of 23 maximum acceleration increments, shoving 68 -percent con- 
trol curves. Eight observations at lover end omitted to avoid crowding 
(Data from work sheet 2 and table V 11(b) .) 



TA ‘m»H - TOVW 


^Line of equal efficiency 


g 



Sample size n = 10 



samples 
n = 20 


Flgiore 6.- CoE5)axi3an of en^ilTlcal sampling values of relative efficiencies 
of proposed order-statistics estimator to Gumbel estimator^ for P = O .9 
and sanmle sizes n => 10, 20, and 50* (Data from table V, columns 8, 9j 
and 10.) 


sets of 100 
samples 
n = 30 


1 


KA.CA TW 5055 



